提交 · 8da007348bf52a91e5137d27d7dcd528edbb80ce · openeuler / Kernel

13 2月, 2017 9 次提交

mmc: core: First step in cleaning up public mmc header files · 8da00734

由 Ulf Hansson 提交于 1月 13, 2017

This is the first step in cleaning up the public mmc header files. In this
change we makes sure each header file builds standalone, as that helps to
resolve dependencies.

While changing this, it also seems reasonable to stop including other
headers from inside a header itself which it don't depend upon.
Additionally, in some cases such dependencies are better resolved by
forward declaring the needed struct.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NShawn Lin <shawn.lin@rock-chips.com>

8da00734

mmc: mxcmmc: Include interrupt.h in the platform data header · 8336bf68

由 Ulf Hansson 提交于 1月 11, 2017

The mxcmmc platform data header depends on interrupt.h. Don't rely on the
public mmc header host.h to include it, bud instead make that dependency
explicit.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

8336bf68

mmc: dw_mmc: Remove the public dw_mmc header file · 0f21c58c

由 Ulf Hansson 提交于 12月 30, 2016

There are currently no external users of the public dw_mmc header file,
except the dw_mmc driver itself. Therefore let's move the definitions from
the public dw_mmc header file into the existing private dw_mmc header file
and then remove the public one.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>

0f21c58c

mmc: sh_mmcif: Remove unused clk_ctrl2_present from the platform data · dba4bb48

由 Ulf Hansson 提交于 12月 30, 2016

There are currently no users of the clk_ctrl2_present member from the
platform data, so let's remove it.

Note, as some of the sh_mmcif variants may support clk_ctrl2, let's keep
the current code in the driver, which deals with this. For future support,
we should invent a DT binding instead, but let's leave that until it's
needed.

Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>

dba4bb48

mmc: sh_mmcif: Remove unused ccs_unsupported from the platform data · 8020f711

由 Ulf Hansson 提交于 12月 30, 2016

There are currently no users of the ccs_unsupported member from the
platform data, so let's remove it.

Note, as some of the sh_mmcif variants may not support ccs, let's keep the
current code in the driver, which deals with this. For future support, we
should invent a DT binding instead, but let's leave that until it's needed.

Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>

8020f711

mmc: sh_mmcif: Remove unused ->get_cd() platform callback · 5957eeba

由 Ulf Hansson 提交于 12月 30, 2016

Removing the callback also enables us to remove the sh_mmcif_get_cd()
altogether, as we convert to use mmc_gpio_get_cd() to the same kind of
work.

Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

5957eeba

mmc: sh_mmcif: Remove unused use_cd_gpio/cd_gpio from platform data · 95cc4df7

由 Ulf Hansson 提交于 12月 30, 2016

Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>

95cc4df7

mmc: Removed the unused public mmc boot.h header · 13f00f9f

由 Ulf Hansson 提交于 12月 30, 2016

Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>

13f00f9f

mmc: core: Export device lifetime information through sysfs · 46bc5c40

由 Jungseung Lee 提交于 12月 22, 2016

In the eMMC 5.0 version of the spec, several EXT_CSD fields about
device lifetime are added.

 - Two types of estimated indications reflected by averaged wear out of memory
 - An indication reflected by average reserved blocks

Export the information through sysfs.
Signed-off-by: NJungseung Lee <js07.lee@samsung.com>
Reviewed-by: NJaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: NShawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

46bc5c40

09 2月, 2017 3 次提交

cpumask: use nr_cpumask_bits for parsing functions · 4d59b6cc

由 Tejun Heo 提交于 2月 08, 2017

Commit 513e3d2d ("cpumask: always use nr_cpu_ids in formatting and
parsing functions") converted both cpumask printing and parsing
functions to use nr_cpu_ids instead of nr_cpumask_bits.  While this was
okay for the printing functions as it just picked one of the two output
formats that we were alternating between depending on a kernel config,
doing the same for parsing wasn't okay.

nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS.  We can always use
nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can
be more efficient to use NR_CPUS when we can get away with it.
Converting the printing functions to nr_cpu_ids makes sense because it
affects how the masks get presented to userspace and doesn't break
anything; however, using nr_cpu_ids for parsing functions can
incorrectly leave the higher bits uninitialized while reading in these
masks from userland.  As all testing and comparison functions use
nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks
can erroneously yield false negative results.

This made the taskstats interface incorrectly return -EINVAL even when
the inputs were correct.

Fix it by restoring the parse functions to use nr_cpumask_bits instead
of nr_cpu_ids.

Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org
Fixes: 513e3d2d ("cpumask: always use nr_cpu_ids in formatting and parsing functions")
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NMartin Steigerwald <martin.steigerwald@teamix.de>
Debugged-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d59b6cc

mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers · 0911d004

由 Jan Kara 提交于 2月 08, 2017

Some ->page_mkwrite handlers may return VM_FAULT_RETRY as its return
code (GFS2 or Lustre can definitely do this).  However VM_FAULT_RETRY
from ->page_mkwrite is completely unhandled by the mm code and results
in locking and writeably mapping the page which definitely is not what
the caller wanted.

Fix Lustre and block_page_mkwrite_ret() used by other filesystems
(notably GFS2) to return VM_FAULT_NOPAGE instead which results in
bailing out from the fault code, the CPU then retries the access, and we
fault again effectively doing what the handler wanted.

Link: http://lkml.kernel.org/r/20170203150729.15863-1-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NJinshan Xiong <jinshan.xiong@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0911d004

net: introduce device min_header_len · 217e6fa2

由 Willem de Bruijn 提交于 2月 07, 2017

The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd ("packet: validate variable length ll headers")
Reported-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

217e6fa2

04 2月, 2017 4 次提交

base/memory, hotplug: fix a kernel oops in show_valid_zones() · a96dfddb

由 Toshi Kani 提交于 2月 03, 2017

Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237c ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.comSigned-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>	[4.4+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a96dfddb

log2: make order_base_2() behave correctly on const input value zero · 29905b52

由 Ard Biesheuvel 提交于 2月 02, 2017

The function order_base_2() is defined (according to the comment block)
as returning zero on input zero, but subsequently passes the input into
roundup_pow_of_two(), which is explicitly undefined for input zero.

This has gone unnoticed until now, but optimization passes in GCC 7 may
produce constant folded function instances where a constant value of
zero is passed into order_base_2(), resulting in link errors against the
deliberately undefined '____ilog2_NaN'.

So update order_base_2() to adhere to its own documented interface.

[ See

     http://marc.info/?l=linux-kernel&m=147672952517795&w=2

  and follow-up discussion for more background. The gcc "optimization
  pass" is really just broken, but now the GCC trunk problem seems to
  have escaped out of just specially built daily images, so we need to
  work around it in mainline.    - Linus ]
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29905b52

module: unify absolute krctab definitions for 32-bit and 64-bit · 4b9eee96

由 Ard Biesheuvel 提交于 2月 03, 2017

The previous patch introduced a separate inline asm version of the
krcrctab declaration template for use with 64-bit architectures, which
cannot refer to ELF symbols using 32-bit quantities.

This declaration should be equivalent to the C one for 32-bit
architectures, but just in case - unify them in a separate patch, which
can simply be dropped if it turns out to break anything.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b9eee96

modversions: treat symbol CRCs as 32 bit quantities · 71810db2

由 Ard Biesheuvel 提交于 2月 03, 2017

The modversion symbol CRCs are emitted as ELF symbols, which allows us
to easily populate the kcrctab sections by relying on the linker to
associate each kcrctab slot with the correct value.

This has a couple of downsides:

 - Given that the CRCs are treated as memory addresses, we waste 4 bytes
   for each CRC on 64 bit architectures,

 - On architectures that support runtime relocation, a R_<arch>_RELATIVE
   relocation entry is emitted for each CRC value, which identifies it
   as a quantity that requires fixing up based on the actual runtime
   load offset of the kernel. This results in corrupted CRCs unless we
   explicitly undo the fixup (and this is currently being handled in the
   core module code)

 - Such runtime relocation entries take up 24 bytes of __init space
   each, resulting in a x8 overhead in [uncompressed] kernel size for
   CRCs.

Switching to explicit 32 bit values on 64 bit architectures fixes most
of these issues, given that 32 bit values are not treated as quantities
that require fixing up based on the actual runtime load offset.  Note
that on some ELF64 architectures [such as PPC64], these 32-bit values
are still emitted as [absolute] runtime relocatable quantities, even if
the value resolves to a build time constant.  Since relative relocations
are always resolved at build time, this patch enables MODULE_REL_CRCS on
powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
references into relative references into .rodata where the actual CRC
value is stored.

So redefine all CRC fields and variables as u32, and redefine the
__CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
inline assembler (which is necessary since 64-bit C code cannot use
32-bit types to hold memory addresses, even if they are ultimately
resolved using values that do not exceed 0xffffffff).  To avoid
potential problems with legacy 32-bit architectures using legacy
toolchains, the equivalent C definition of the kcrctab entry is retained
for 32-bit architectures.

Note that this mostly reverts commit d4703aef ("module: handle ppc64
relocating kcrctabs when CONFIG_RELOCATABLE=y")
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

71810db2

02 2月, 2017 1 次提交

net: fix ndo_features_check/ndo_fix_features comment ordering · 1a2a1444

由 Dimitris Michailidis 提交于 1月 31, 2017

Commit cdba756f ("net: move ndo_features_check() close to
ndo_start_xmit()") inadvertently moved the doc comment for
.ndo_fix_features instead of .ndo_features_check. Fix the comment
ordering.

Fixes: cdba756f ("net: move ndo_features_check() close to ndo_start_xmit()")
Signed-off-by: NDimitris Michailidis <dmichail@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a2a1444

01 2月, 2017 3 次提交

perf/x86/intel/uncore: Make package handling more robust · fff4b87e

由 Thomas Gleixner 提交于 1月 31, 2017

The package management code in uncore relies on package mapping being
available before a CPU is started. This changed with:

  9d85eb91 ("x86/smpboot: Make logical package management more robust")

because the ACPI/BIOS information turned out to be unreliable, but that
left uncore in broken state. This was not noticed because on a regular boot
all CPUs are online before uncore is initialized.

Move the allocation to the CPU online callback and simplify the hotplug
handling. At this point the package mapping is established and correct.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
Link: http://lkml.kernel.org/r/20170131230141.377156255@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

fff4b87e

perf/x86/intel/rapl: Make package handling more robust · dd86e373

由 Thomas Gleixner 提交于 1月 31, 2017

The package management code in RAPL relies on package mapping being
available before a CPU is started. This changed with:

  9d85eb91 ("x86/smpboot: Make logical package management more robust")

because the ACPI/BIOS information turned out to be unreliable, but that
left RAPL in broken state. This was not noticed because on a regular boot
all CPUs are online before RAPL is initialized.

A possible fix would be to reintroduce the mess which allocates a package
data structure in CPU prepare and when it turns out to already exist in
starting throw it away later in the CPU online callback. But that's a
horrible hack and not required at all because RAPL becomes functional for
perf only in the CPU online callback. That's correct because user space is
not yet informed about the CPU being onlined, so nothing caan rely on RAPL
being available on that particular CPU.

Move the allocation to the CPU online callback and simplify the hotplug
handling. At this point the package mapping is established and correct.

This also adds a missing check for available package data in the
event_init() function.
Reported-by: NYasuaki Ishimatsu <yasu.isimatu@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
Link: http://lkml.kernel.org/r/20170131230141.212593966@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

dd86e373

fscache: Fix dead object requeue · e26bfebd

由 David Howells 提交于 1月 31, 2017

Under some circumstances, an fscache object can become queued such that it
fscache_object_work_func() can be called once the object is in the
OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
invoke the handler for the state (which is hard coded to 0x2).

The way this comes about is something like the following:

 (1) The object dispatcher is processing a work state for an object.  This
     is done in workqueue context.

 (2) An out-of-band event comes in that isn't masked, causing the object to
     be queued, say EV_KILL.

 (3) The object dispatcher finishes processing the current work state on
     that object and then sees there's another event to process, so,
     without returning to the workqueue core, it processes that event too.
     It then follows the chain of events that initiates until we reach
     OBJECT_DEAD without going through a wait state (such as
     WAIT_FOR_CLEARANCE).

     At this point, object->events may be 0, object->event_mask will be 0
     and oob_event_mask will be 0.

 (4) The object dispatcher returns to the workqueue processor, and in due
     course, this sees that the object's work item is still queued and
     invokes it again.

 (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
     jumps to it - resulting in an OOPS.

When I'm seeing this, the work state in (1) appears to have been either
LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
fscache_osm_lookup_oob).

The window for (2) is very small:

 (A) object->event_mask is cleared whilst the event dispatch process is
     underway - though there's no memory barrier to force this to the top
     of the function.

     The window, therefore is from the time the object was selected by the
     workqueue processor and made requeueable to the time the mask was
     cleared.

 (B) fscache_raise_event() will only queue the object if it manages to set
     the event bit and the corresponding event_mask bit was set.

     The enqueuement is then deferred slightly whilst we get a ref on the
     object and get the per-CPU variable for workqueue congestion.  This
     slight deferral slightly increases the probability by allowing extra
     time for the workqueue to make the item requeueable.

Handle this by giving the dead state a processor function and checking the
for the dead state address rather than seeing if the processor function is
address 0x2.  The dead state processor function can then set a flag to
indicate that it's occurred and give a warning if it occurs more than once
per object.

If this race occurs, an oops similar to the following is seen (note the RIP
value):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<0000000000000002>] 0x1
PGD 0
Oops: 0010 [#1] SMP
Modules linked in: ...
CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
RSP: 0018:ffff880717547df8  EFLAGS: 00010202
RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
Call Trace:
 [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff8109d5db>] process_one_work+0x17b/0x470
 [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
 [<ffffffff810a5acf>] kthread+0xcf/0xe0
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff816460d8>] ret_from_fork+0x58/0x90
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJeremy McNicoll <jeremymc@redhat.com>
Tested-by: NFrank Sorenson <sorenson@redhat.com>
Tested-by: NBenjamin Coddington <bcodding@redhat.com>
Reviewed-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e26bfebd

31 1月, 2017 1 次提交

Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read() · 433e19cf

由 Dexuan Cui 提交于 1月 28, 2017

Commit a389fcfd ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")
added the proper mb(), but removed the test "prev_write_sz < pending_sz"
when making the signal decision.

As a result, the guest can signal the host unnecessarily,
and then the host can throttle the guest because the host
thinks the guest is buggy or malicious; finally the user
running stress test can perceive intermittent freeze of
the guest.

This patch brings back the test, and properly handles the
in-place consumption APIs used by NetVSC (see get_next_pkt_raw(),
put_pkt_raw() and commit_rd_index()).

Fixes: a389fcfd ("Drivers: hv: vmbus: Fix signaling logic in
hv_need_to_signal_on_read()")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Reported-by: NRolf Neugebauer <rolf.neugebauer@docker.com>
Tested-by: NRolf Neugebauer <rolf.neugebauer@docker.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

433e19cf

30 1月, 2017 2 次提交

irqdomain: Avoid activating interrupts more than once · 08d85f3e

由 Marc Zyngier 提交于 1月 17, 2017

Since commit f3b0946d ("genirq/msi: Make sure PCI MSIs are
activated early"), we can end-up activating a PCI/MSI twice (once
at allocation time, and once at startup time).

This is normally of no consequences, except that there is some
HW out there that may misbehave if activate is used more than once
(the GICv3 ITS, for example, uses the activate callback
to issue the MAPVI command, and the architecture spec says that
"If there is an existing mapping for the EventID-DeviceID
combination, behavior is UNPREDICTABLE").

While this could be worked around in each individual driver, it may
make more sense to tackle the issue at the core level. In order to
avoid getting in that situation, let's have a per-interrupt flag
to remember if we have already activated that interrupt or not.

Fixes: f3b0946d ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-and-tested-by: NAndre Przywara <andre.przywara@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

08d85f3e

can: Fix kernel panic at security_sock_rcv_skb · f1712c73

由 Eric Dumazet 提交于 1月 27, 2017

Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()

The main problem seems that the sockets themselves are not RCU
protected.

If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.

Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.

[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0

Call Trace:
 <IRQ>
 [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
 [<ffffffff81d55771>] sk_filter+0x41/0x210
 [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
 [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
 [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
 [<ffffffff81f07af9>] can_receive+0xd9/0x120
 [<ffffffff81f07beb>] can_rcv+0xab/0x100
 [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
 [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
 [<ffffffff81d37f67>] process_backlog+0x127/0x280
 [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
 [<ffffffff810c88d4>] __do_softirq+0x184/0x440
 [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
 <EOI>
 [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
 [<ffffffff810c8bed>] do_softirq+0x1d/0x20
 [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
 [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
 [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
 [<ffffffff810e3baf>] process_one_work+0x24f/0x670
 [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
 [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
 [<ffffffff810ebafc>] kthread+0x12c/0x150
 [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
Reported-by: NZhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1712c73

28 1月, 2017 2 次提交

percpu-refcount: fix reference leak during percpu-atomic transition · 966d2b04

由 Douglas Miller 提交于 1月 28, 2017

percpu_ref_tryget() and percpu_ref_tryget_live() should return
"true" IFF they acquire a reference. But the return value from
atomic_long_inc_not_zero() is a long and may have high bits set,
e.g. PERCPU_COUNT_BIAS, and the return value of the tryget routines
is bool so the reference may actually be acquired but the routines
return "false" which results in a reference leak since the caller
assumes it does not need to do a corresponding percpu_ref_put().

This was seen when performing CPU hotplug during I/O, as hangs in
blk_mq_freeze_queue_wait where percpu_ref_kill (blk_mq_freeze_queue_start)
raced with percpu_ref_tryget (blk_mq_timeout_work).
Sample stack trace:

__switch_to+0x2c0/0x450
__schedule+0x2f8/0x970
schedule+0x48/0xc0
blk_mq_freeze_queue_wait+0x94/0x120
blk_mq_queue_reinit_work+0xb8/0x180
blk_mq_queue_reinit_prepare+0x84/0xa0
cpuhp_invoke_callback+0x17c/0x600
cpuhp_up_callbacks+0x58/0x150
_cpu_up+0xf0/0x1c0
do_cpu_up+0x120/0x150
cpu_subsys_online+0x64/0xe0
device_online+0xb4/0x120
online_store+0xb4/0xc0
dev_attr_store+0x68/0xa0
sysfs_kf_write+0x80/0xb0
kernfs_fop_write+0x17c/0x250
__vfs_write+0x6c/0x1e0
vfs_write+0xd0/0x270
SyS_write+0x6c/0x110
system_call+0x38/0xe0

Examination of the queue showed a single reference (no PERCPU_COUNT_BIAS,
and __PERCPU_REF_DEAD, __PERCPU_REF_ATOMIC set) and no requests.
However, conditions at the time of the race are count of PERCPU_COUNT_BIAS + 0
and __PERCPU_REF_DEAD and __PERCPU_REF_ATOMIC set.

The fix is to make the tryget routines use an actual boolean internally instead
of the atomic long result truncated to a int.

Fixes: e625305b percpu-refcount: make percpu_ref based on longs instead of ints
Link: https://bugzilla.kernel.org/show_bug.cgi?id=190751Signed-off-by: NDouglas Miller <dougmill@linux.vnet.ibm.com>
Reviewed-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: e625305b ("percpu-refcount: make percpu_ref based on longs instead of ints")
Cc: stable@vger.kernel.org # v3.18+

966d2b04

net: phy: micrel: add support for KSZ8795 · 9d162ed6

由 Sean Nyekjaer 提交于 1月 27, 2017

This is adds support for the PHYs in the KSZ8795 5port managed switch.

It will allow to detect the link between the switch and the soc
and uses the same read_status functions as the KSZ8873MLL switch.
Signed-off-by: NSean Nyekjaer <sean.nyekjaer@prevas.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d162ed6

26 1月, 2017 2 次提交

net: phy: leds: Fix truncated LED trigger names · 3c880eb0

由 Geert Uytterhoeven 提交于 1月 25, 2017

Commit 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and
bus_id") increased the size of MII bus IDs, but forgot to update the
private definition in <linux/phy_led_triggers.h>.
This may cause:
  1. Truncation of LED trigger names,
  2. Duplicate LED trigger names,
  3. Failures registering LED triggers,
  4. Crashes due to bad error handling in the LED trigger failure path.

To fix this, and prevent the definitions going out of sync again in the
future, let the PHY LED trigger code use the existing MII_BUS_ID_SIZE
definition.

Example:
  - Before I had triggers "ee700000.etherne:01:100Mbps" and
    "ee700000.etherne:01:10Mbps",
  - After the increase of MII_BUS_ID_SIZE, both became
    "ee700000.ethernet-ffffffff:01:" => FAIL,
  - Now, the triggers are "ee700000.ethernet-ffffffff:01:100Mbps" and
    "ee700000.ethernet-ffffffff:01:10Mbps", which are unique again.

Fixes: 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and bus_id")
Fixes: 2e0bc452 ("net: phy: leds: add support for led triggers on phy link state change")
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c880eb0

net: phy: leds: Break dependency of phy.h on phy_led_triggers.h · d6f8cfa3

由 Geert Uytterhoeven 提交于 1月 25, 2017

<linux/phy.h> includes <linux/phy_led_triggers.h>, which is not really
needed.  Drop the include from <linux/phy.h>, and add it to all users
that didn't include it explicitly.
Suggested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6f8cfa3

25 1月, 2017 5 次提交

mm, page_alloc: fix check for NULL preferred_zone · ea57485a

由 Vlastimil Babka 提交于 1月 24, 2017

Patch series "fix premature OOM regression in 4.7+ due to cpuset races".

This is v2 of my attempt to fix the recent report based on LTP cpuset
stress test [1].  The intention is to go to stable 4.9 LTSS with this,
as triggering repeated OOMs is not nice.  That's why the patches try to
be not too intrusive.

Unfortunately why investigating I found that modifying the testcase to
use per-VMA policies instead of per-task policies will bring the OOM's
back, but that seems to be much older and harder to fix problem.  I have
posted a RFC [2] but I believe that fixing the recent regressions has a
higher priority.

Longer-term we might try to think how to fix the cpuset mess in a better
and less error prone way.  I was for example very surprised to learn,
that cpuset updates change not only task->mems_allowed, but also
nodemask of mempolicies.  Until now I expected the parameter to
alloc_pages_nodemask() to be stable.  I wonder why do we then treat
cpusets specially in get_page_from_freelist() and distinguish HARDWALL
etc, when there's unconditional intersection between mempolicy and
cpuset.  I would expect the nodemask adjustment for saving overhead in
g_p_f(), but that clearly doesn't happen in the current form.  So we
have both crazy complexity and overhead, AFAICS.

[1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
[2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz

This patch (of 4):

Since commit c33d6c06 ("mm, page_alloc: avoid looking up the first
zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
which can theoretically happen due to concurrent cpuset modification.  We
check the zoneref pointer which is never NULL and we should check the zone
pointer.  Also document this in first_zones_zonelist() comment per Michal
Hocko.

Fixes: c33d6c06 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea57485a

kernel/watchdog: prevent false hardlockup on overloaded system · b94f5118

由 Don Zickus 提交于 1月 24, 2017

On an overloaded system, it is possible that a change in the watchdog
threshold can be delayed long enough to trigger a false positive.

This can easily be achieved by having a cpu spinning indefinitely on a
task, while another cpu updates watchdog threshold.

What happens is while trying to park the watchdog threads, the hrtimers
on the other cpus trigger and reprogram themselves with the new slower
watchdog threshold.  Meanwhile, the nmi watchdog is still programmed
with the old faster threshold.

Because the one cpu is blocked, it prevents the thread parking on the
other cpus from completing, which is needed to shutdown the nmi watchdog
and reprogram it correctly.  As a result, a false positive from the nmi
watchdog is reported.

Fix this by setting a park_in_progress flag to block all lockups until
the parking is complete.

Fix provided by Ulrich Obergfell.

[akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/]
Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.comSigned-off-by: NDon Zickus <dzickus@redhat.com>
Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b94f5118

memory_hotplug: make zone_can_shift() return a boolean value · 8a1f780e

由 Yasuaki Ishimatsu 提交于 1月 24, 2017

online_{kernel|movable} is used to change the memory zone to
ZONE_{NORMAL|MOVABLE} and online the memory.

To check that memory zone can be changed, zone_can_shift() is used.
Currently the function returns minus integer value, plus integer
value and 0. When the function returns minus or plus integer value,
it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}.

But when the function returns 0, there are two meanings.

One of the meanings is that the memory zone does not need to be changed.
For example, when memory is in ZONE_NORMAL and onlined by online_kernel
the memory zone does not need to be changed.

Another meaning is that the memory zone cannot be changed. When memory
is in ZONE_NORMAL and onlined by online_movable, the memory zone may
not be changed to ZONE_MOVALBE due to memory online limitation(see
Documentation/memory-hotplug.txt). In this case, memory must not be
onlined.

The patch changes the return type of zone_can_shift() so that memory
online operation fails when memory zone cannot be changed as follows:

Before applying patch:
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320
   # echo online_movable > memory4097/state
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  8388608
           managed  8388608

   online_movable operation succeeded. But memory is onlined as
   ZONE_NORMAL, not ZONE_MOVABLE.

After applying patch:
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320
   # echo online_movable > memory4097/state
   bash: echo: write error: Invalid argument
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320

   online_movable operation failed because of failure of changing
   the memory zone from ZONE_NORMAL to ZONE_MOVABLE

Fixes: df429ac0 ("memory-hotplug: more general validation of zone during online")
Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.comSigned-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: NReza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a1f780e

SUNRPC: cleanup ida information when removing sunrpc module · c929ea0b

由 Kinglong Mee 提交于 1月 20, 2017

After removing sunrpc module, I get many kmemleak information as,
unreferenced object 0xffff88003316b1e0 (size 544):
  comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0
    [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150
    [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180
    [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
    [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd]
    [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd]
    [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc]
    [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc]
    [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc]
    [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0
    [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0
    [<ffffffffb0390f1f>] vfs_write+0xef/0x240
    [<ffffffffb0392fbd>] SyS_write+0xad/0x130
    [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9
    [<ffffffffffffffff>] 0xffffffffffffffff

I found, the ida information (dynamic memory) isn't cleanup.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Fixes: 2f048db4 ("SUNRPC: Add an identifier for struct rpc_clnt")
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c929ea0b

nfs: Don't increment lock sequence ID after NFS4ERR_MOVED · 059aa734

由 Chuck Lever 提交于 1月 22, 2017

Xuan Qi reports that the Linux NFSv4 client failed to lock a file
that was migrated. The steps he observed on the wire:

1. The client sent a LOCK request to the source server
2. The source server replied NFS4ERR_MOVED
3. The client switched to the destination server
4. The client sent the same LOCK request to the destination
   server with a bumped lock sequence ID
5. The destination server rejected the LOCK request with
   NFS4ERR_BAD_SEQID

RFC 3530 section 8.1.5 provides a list of NFS errors which do not
bump a lock sequence ID.

However, RFC 3530 is now obsoleted by RFC 7530. In RFC 7530 section
9.1.7, this list has been updated by the addition of NFS4ERR_MOVED.
Reported-by: NXuan Qi <xuan.qi@oracle.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v3.7+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

059aa734

21 1月, 2017 1 次提交

virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving · 6391a448

由 Jason Wang 提交于 1月 20, 2017

Commit 501db511 ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on
xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too,
fixing this by adding a hint (has_data_valid) and set it only on the
receiving path.

Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NRolf Neugebauer <rolf.neugebauer@docker.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6391a448

20 1月, 2017 1 次提交

Revert "PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag" · e326ce01

由 Rafael J. Wysocki 提交于 1月 20, 2017

Revert commit 08b98d32 (PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0
flag) as it caused system suspend (in the default configuration) to fail
on Dell XPS13 (9360) with the Kaby Lake processor.

Fixes: 08b98d32 (PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag)
Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

e326ce01

19 1月, 2017 2 次提交

gpio: provide lockdep keys for nested/unnested irqchips · 739e6f59

由 Linus Walleij 提交于 1月 11, 2017

The helper function for adding a GPIO chip compiles in a lockdep
key for debugging, the same key is needed for nested chips as
well.

The macro construction is unreadable, replace this with two
static inlines instead.

The _gpiochip_irqchip_add prefixed function is not helpful,
rename it with gpiochip_irqchip_add_key() that tell us what the
function is actually doing.

Fixes: d245b3f9 ("gpio: simplify adding threaded interrupts")
Cc: Roger Quadros <rogerq@ti.com>
Reported-by: NClemens Gruber <clemens.gruber@pqgruber.com>
Reported-by: NRoger Quadros <rogerq@ti.com>
Reported-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Tested-by: NClemens Gruber <clemens.gruber@pqgruber.com>
Tested-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>

739e6f59

bpf: don't trigger OOM killer under pressure with map alloc · d407bd25

由 Daniel Borkmann 提交于 1月 18, 2017

This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
that are to be used for map allocations. Using kmalloc() for very large
allocations can cause excessive work within the page allocator, so i) fall
back earlier to vmalloc() when the attempt is considered costly anyway,
and even more importantly ii) don't trigger OOM killer with any of the
allocators.

Since this is based on a user space request, for example, when creating
maps with element pre-allocation, we really want such requests to fail
instead of killing other user space processes.

Also, don't spam the kernel log with warnings should any of the allocations
fail under pressure. Given that, we can make backend selection in
bpf_map_area_alloc() generic, and convert all maps over to use this API
for spots with potentially large allocation requests.

Note, replacing the one kmalloc_array() is fine as overflow checks happen
earlier in htab_map_alloc(), since it must also protect the multiplication
for vmalloc() should kmalloc_array() fail.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d407bd25

18 1月, 2017 2 次提交

virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit · 501db511

由 Rolf Neugebauer 提交于 1月 17, 2017

This patch part reverts fd2a0437 and e858fae2 which introduced a
subtle change in how the virtio_net flags are derived from the SKBs
ip_summed field.

With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID
when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to
ip_summed == CHECKSUM_NONE, which should be the same.

Further, the virtio spec 1.0 / CS04 explicitly says that
VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver.

Fixes: fd2a0437 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Fixes: e858fae2 (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: NRolf Neugebauer <rolf.neugebauer@docker.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

501db511

taint/module: Fix problems when out-of-kernel driver defines true or false · 5eb7c0d0

由 Larry Finger 提交于 1月 01, 2017

Commit 7fd8329b ("taint/module: Clean up global and module taint
flags handling") used the key words true and false as character members
of a new struct. These names cause problems when out-of-kernel modules
such as VirtualBox include their own definitions of true and false.

Fixes: 7fd8329b ("taint/module: Clean up global and module taint flags handling")
Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NJessica Yu <jeyu@redhat.com>

5eb7c0d0

17 1月, 2017 1 次提交

bpf: rework prog_digest into prog_tag · f1f7714e

由 Daniel Borkmann 提交于 1月 13, 2017

Commit 7bd509e3 ("bpf: add prog_digest and expose it via
fdinfo/netlink") was recently discussed, partially due to
admittedly suboptimal name of "prog_digest" in combination
with sha1 hash usage, thus inevitably and rightfully concerns
about its security in terms of collision resistance were
raised with regards to use-cases.

The intended use cases are for debugging resp. introspection
only for providing a stable "tag" over the instruction sequence
that both kernel and user space can calculate independently.
It's not usable at all for making a security relevant decision.
So collisions where two different instruction sequences generate
the same tag can happen, but ideally at a rather low rate. The
"tag" will be dumped in hex and is short enough to introspect
in tracepoints or kallsyms output along with other data such
as stack trace, etc. Thus, this patch performs a rename into
prog_tag and truncates the tag to a short output (64 bits) to
make it obvious it's not collision-free.

Should in future a hash or facility be needed with a security
relevant focus, then we can think about requirements, constraints,
etc that would fit to that situation. For now, rework the exposed
parts for the current use cases as long as nothing has been
released yet. Tested on x86_64 and s390x.

Fixes: 7bd509e3 ("bpf: add prog_digest and expose it via fdinfo/netlink")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1f7714e

16 1月, 2017 1 次提交

cpu/hotplug: Provide dynamic range for prepare stage · 4205e478

由 Thomas Gleixner 提交于 1月 10, 2017

Mathieu reported that the LTTNG modules are broken as of 4.10-rc1 due to
the removal of the cpu hotplug notifiers.

Usually I don't care much about out of tree modules, but LTTNG is widely
used in distros. There are two ways to solve that:

1) Reserve a hotplug state for LTTNG

2) Add a dynamic range for the prepare states.

While #1 is the simplest solution, #2 is the proper one as we can convert
in tree users, which do not care about ordering, to the dynamic range as
well.

Add a dynamic range which allows LTTNG to request states in the prepare
stage.
Reported-and-tested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701101353010.3401@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4205e478

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功