提交 · d6aaba615a482ce7d3ec218cf7b8d02d0d5753b8 · openanolis / cloud-kernel

02 8月, 2017 4 次提交

x86/intel_rdt/cqm: Add tasks file support · d6aaba61

由 Vikas Shivappa 提交于 7月 25, 2017

The root directory, ctrl_mon and monitor groups are populated
with a read/write file named "tasks". When read, it shows all the task
IDs assigned to the resource group.

Tasks can be added to groups by writing the PID to the file. A task can
be present in one "ctrl_mon" group "and" one "monitor" group. IOW a
PID_x can be seen in a ctrl_mon group and a monitor group at the same
time. When a task is added to a ctrl_mon group, it is automatically
removed from the previous ctrl_mon group where it belonged. Similarly if
a task is moved to a monitor group it is removed from the previous
monitor group . Also since the monitor groups can only have subset of
tasks of parent ctrl_mon group, a task can be moved to a monitor group
only if its already present in the parent ctrl_mon group.

Task membership is indicated by a new field in the task_struct "u32
rmid" which holds the RMID for the task. RMID=0 is reserved for the
default root group where the tasks belong to at mount.

[tony: zero the rmid if rdtgroup was deleted when task was being moved]
Signed-off-by: NTony Luck <tony.luck@linux.intel.com>
Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-16-git-send-email-vikas.shivappa@linux.intel.com

d6aaba61

x86/intel_rdt: Change closid type from int to u32 · 0734ded1

由 Vikas Shivappa 提交于 7月 25, 2017

OS associates a CLOSid(Class of service id) to a task by writing the
high 32 bits of per CPU IA32_PQR_ASSOC MSR when a task is scheduled in.
CPUID.(EAX=10H, ECX=1):EDX[15:0] enumerates the max CLOSID supported and
it is zero indexed. Hence change the type to u32 from int.
Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-15-git-send-email-vikas.shivappa@linux.intel.com

0734ded1

x86/intel_rdt: Introduce a common compile option for RDT · f01d7d51

由 Vikas Shivappa 提交于 7月 25, 2017

We currently have a CONFIG_RDT_A which is for RDT(Resource directory
technology) allocation based resctrl filesystem interface. As a
preparation to add support for RDT monitoring as well into the same
resctrl filesystem, change the config option to be CONFIG_RDT which
would include both RDT allocation and monitoring code.

No functional change.
Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com

f01d7d51

x86/perf/cqm: Wipe out perf based cqm · c39a0e2c

由 Vikas Shivappa 提交于 7月 25, 2017

'perf cqm' never worked due to the incompatibility between perf
infrastructure and cqm hardware support.  The hardware uses RMIDs to
track the llc occupancy of tasks and these RMIDs are per package. This
makes monitoring a hierarchy like cgroup along with monitoring of tasks
separately difficult and several patches sent to lkml to fix them were
NACKed. Further more, the following issues in the current perf cqm make
it almost unusable:

    1. No support to monitor the same group of tasks for which we do
    allocation using resctrl.

    2. It gives random and inaccurate data (mostly 0s) once we run out
    of RMIDs due to issues in Recycling.

    3. Recycling results in inaccuracy of data because we cannot
    guarantee that the RMID was stolen from a task when it was not
    pulling data into cache or even when it pulled the least data. Also
    for monitoring llc_occupancy, if we stop using an RMID_x and then
    start using an RMID_y after we reclaim an RMID from an other event,
    we miss accounting all the occupancy that was tagged to RMID_x at a
    later perf_count.

    2. Recycling code makes the monitoring code complex including
    scheduling because the event can lose RMID any time. Since MBM
    counters count bandwidth for a period of time by taking snap shot of
    total bytes at two different times, recycling complicates the way we
    count MBM in a hierarchy. Also we need a spin lock while we do the
    processing to account for MBM counter overflow. We also currently
    use a spin lock in scheduling to prevent the RMID from being taken
    away.

    4. Lack of support when we run different kind of event like task,
    system-wide and cgroup events together. Data mostly prints 0s. This
    is also because we can have only one RMID tied to a cpu as defined
    by the cqm hardware but a perf can at the same time tie multiple
    events during one sched_in.

    5. No support of monitoring a group of tasks. There is partial support
    for cgroup but it does not work once there is a hierarchy of cgroups
    or if we want to monitor a task in a cgroup and the cgroup itself.

    6. No support for monitoring tasks for the lifetime without perf
    overhead.

    7. It reported the aggregate cache occupancy or memory bandwidth over
    all sockets. But most cloud and VMM based use cases want to know the
    individual per-socket usage.
Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com

c39a0e2c

27 7月, 2017 4 次提交

genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration" · 83979133

由 Thomas Gleixner 提交于 7月 27, 2017

That commit was part of the changes moving x86 to the generic CPU hotplug
interrupt migration code. The force flag was required on x86 before the
hierarchical irqdomain rework, but invoking set_affinity() with force=true
stayed and had no side effects.

At some point in the past, the force flag got repurposed to support the
exynos timer interrupt affinity setting to a not yet online CPU, so the
interrupt controller callback does not verify the supplied affinity mask
against cpu_online_mask.

Setting the flag in the CPU hotplug code causes the cpu online masking to
be blocked on these irq controllers and results in potentially affining an
interrupt to the CPU which is unplugged, i.e. instead of moving it away,
it's just reassigned to it.

As the force flags is not longer needed on x86, it's safe to revert that
patch so the ARM irqchips which use the force flag work again.

Add comments to that effect, so this won't happen again.

Note: The online mask handling should be done in the generic code and the
force flag and the masking in the irq chips removed all together, but
that's not a change possible for 4.13. 

Fixes: 77f85e66 ("genirq/cpuhotplug: Set force affinity flag on hotplug migration")
Reported-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707271217590.3109@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

83979133

drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU · a3287c41

由 Will Deacon 提交于 7月 25, 2017

Since the PMU register interface is banked per CPU, CPU PMU interrrupts
cannot be handled by a CPU other than the one with the PMU asserting the
interrupt. This means that migrating PMU SPIs, as we do during a CPU
hotplug operation doesn't make any sense and can lead to the IRQ being
disabled entirely if we route a spurious IRQ to the new affinity target.

This has been observed in practice on AMD Seattle, where CPUs on the
non-boot cluster appear to take a spurious PMU IRQ when coming online,
which is routed to CPU0 where it cannot be handled.

This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
affinity prior to requesting them, ensuring that they cannot
be migrated during hotplug events. This interacts badly with the DB8500
erratum workaround that ping-pongs the interrupt affinity from the handler,
so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
to be overridden in the platdata.

Fixes: 3cf7ee98 ("drivers/perf: arm_pmu: move irq request/free into probe")
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

a3287c41

dm, dax: Make sure dm_dax_flush() is called if device supports it · 273752c9

由 Vivek Goyal 提交于 7月 26, 2017

Currently dm_dax_flush() is not being called, even if underlying dax
device supports write cache, because DAXDEV_WRITE_CACHE is not being
propagated up to the DM dax device.

If the underlying dax device supports write cache, set
DAXDEV_WRITE_CACHE on the DM dax device.  This will cause dm_dax_flush()
to be called.

Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

273752c9

KVM: make pid available for uevents without debugfs · fdeaf7e3

由 Claudio Imbrenda 提交于 7月 24, 2017

Simplify and improve the code so that the PID is always available in
the uevent even when debugfs is not available.

This adds a userspace_pid field to struct kvm, as per Radim's
suggestion, so that the PID can be retrieved on destruction too.
Acked-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Fixes: 286de8f6 ("KVM: trigger uevents when creating or destroying a VM")
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fdeaf7e3

26 7月, 2017 1 次提交

nvme-fc: revise TRADDR parsing · 9c5358e1

由 James Smart 提交于 7月 17, 2017

The FC-NVME spec hasn't locked down on the format string for TRADDR.
Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
where the wwn's are hex values but not prefixed by 0x.

Most implementations so far expect a string format of
"nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
uses the match_u64 parser which requires a leading 0x prefix to set
the base properly. If it's not there, a match will either fail or return
a base 10 value.

The resolution in T11 is pushing out. Therefore, to fix things now and
to cover any eventuality and any implementations already in the field,
this patch adds support for both formats.

The change consists of replacing the token matching routine with a
routine that validates the fixed string format, and then builds
a local copy of the hex name with a 0x prefix before calling
the system parser.

Note: the same parser routine exists in both the initiator and target
transports. Given this is about the only "shared" item, we chose to
replicate rather than create an interdendency on some shared code.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9c5358e1

25 7月, 2017 2 次提交

nvme: fabrics commands should use the fctype field for data direction · 2fd4167f

由 Jon Derrick 提交于 7月 12, 2017

Fabrics commands with opcode 0x7F use the fctype field to indicate data
direction.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NSagi Grimberg <sai@grmberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Fixes: eb793e2c ("nvme.h: add NVMe over Fabrics definitions")

2fd4167f

sched/wait: Clean up some documentation warnings · 6c423f57

由 Jonathan Corbet 提交于 7月 24, 2017

A couple of kerneldoc comments in <linux/wait.h> had incorrect names for
macro parameters, with this unsightly result:

  ./include/linux/wait.h:555: warning: No description found for parameter 'wq'
  ./include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout'
  ./include/linux/wait.h:759: warning: No description found for parameter 'wq_head'
  ./include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable'

Correct the comments and kill the warnings.
Signed-off-by: NJonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20170724135800.769c4042@lwn.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

6c423f57

24 7月, 2017 1 次提交

uuid: remove uuid_be · 832e4c83

由 Christoph Hellwig 提交于 5月 11, 2017

Everything uses uuid_t now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

832e4c83

21 7月, 2017 2 次提交

NFS: Store the raw NFS access mask in the inode's access cache · bd8b2441

由 Trond Myklebust 提交于 7月 11, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bd8b2441

bpf: fix mixed signed/unsigned derived min/max value bounds · 4cabc5b1

由 Daniel Borkmann 提交于 7月 21, 2017

Edward reported that there's an issue in min/max value bounds
tracking when signed and unsigned compares both provide hints
on limits when having unknown variables. E.g. a program such
as the following should have been rejected:

   0: (7a) *(u64 *)(r10 -8) = 0
   1: (bf) r2 = r10
   2: (07) r2 += -8
   3: (18) r1 = 0xffff8a94cda93400
   5: (85) call bpf_map_lookup_elem#1
   6: (15) if r0 == 0x0 goto pc+7
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
   7: (7a) *(u64 *)(r10 -16) = -8
   8: (79) r1 = *(u64 *)(r10 -16)
   9: (b7) r2 = -1
  10: (2d) if r1 > r2 goto pc+3
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
  R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
  11: (65) if r1 s> 0x1 goto pc+2
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
  R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
  12: (0f) r0 += r1
  13: (72) *(u8 *)(r0 +0) = 0
  R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
  R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
  14: (b7) r0 = 0
  15: (95) exit

What happens is that in the first part ...

   8: (79) r1 = *(u64 *)(r10 -16)
   9: (b7) r2 = -1
  10: (2d) if r1 > r2 goto pc+3

... r1 carries an unsigned value, and is compared as unsigned
against a register carrying an immediate. Verifier deduces in
reg_set_min_max() that since the compare is unsigned and operation
is greater than (>), that in the fall-through/false case, r1's
minimum bound must be 0 and maximum bound must be r2. Latter is
larger than the bound and thus max value is reset back to being
'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
'R1=inv,min_value=0'. The subsequent test ...

  11: (65) if r1 s> 0x1 goto pc+2

... is a signed compare of r1 with immediate value 1. Here,
verifier deduces in reg_set_min_max() that since the compare
is signed this time and operation is greater than (>), that
in the fall-through/false case, we can deduce that r1's maximum
bound must be 1, meaning with prior test, we result in r1 having
the following state: R1=inv,min_value=0,max_value=1. Given that
the actual value this holds is -8, the bounds are wrongly deduced.
When this is being added to r0 which holds the map_value(_adj)
type, then subsequent store access in above case will go through
check_mem_access() which invokes check_map_access_adj(), that
will then probe whether the map memory is in bounds based
on the min_value and max_value as well as access size since
the actual unknown value is min_value <= x <= max_value; commit
fce366a9 ("bpf, verifier: fix alu ops against map_value{,
_adj} register types") provides some more explanation on the
semantics.

It's worth to note in this context that in the current code,
min_value and max_value tracking are used for two things, i)
dynamic map value access via check_map_access_adj() and since
commit 06c1c049 ("bpf: allow helpers access to variable memory")
ii) also enforced at check_helper_mem_access() when passing a
memory address (pointer to packet, map value, stack) and length
pair to a helper and the length in this case is an unknown value
defining an access range through min_value/max_value in that
case. The min_value/max_value tracking is /not/ used in the
direct packet access case to track ranges. However, the issue
also affects case ii), for example, the following crafted program
based on the same principle must be rejected as well:

   0: (b7) r2 = 0
   1: (bf) r3 = r10
   2: (07) r3 += -512
   3: (7a) *(u64 *)(r10 -16) = -8
   4: (79) r4 = *(u64 *)(r10 -16)
   5: (b7) r6 = -1
   6: (2d) if r4 > r6 goto pc+5
  R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
  R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
   7: (65) if r4 s> 0x1 goto pc+4
  R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
  R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
  R10=fp
   8: (07) r4 += 1
   9: (b7) r5 = 0
  10: (6a) *(u16 *)(r10 -512) = 0
  11: (85) call bpf_skb_load_bytes#26
  12: (b7) r0 = 0
  13: (95) exit

Meaning, while we initialize the max_value stack slot that the
verifier thinks we access in the [1,2] range, in reality we
pass -7 as length which is interpreted as u32 in the helper.
Thus, this issue is relevant also for the case of helper ranges.
Resetting both bounds in check_reg_overflow() in case only one
of them exceeds limits is also not enough as similar test can be
created that uses values which are within range, thus also here
learned min value in r1 is incorrect when mixed with later signed
test to create a range:

   0: (7a) *(u64 *)(r10 -8) = 0
   1: (bf) r2 = r10
   2: (07) r2 += -8
   3: (18) r1 = 0xffff880ad081fa00
   5: (85) call bpf_map_lookup_elem#1
   6: (15) if r0 == 0x0 goto pc+7
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
   7: (7a) *(u64 *)(r10 -16) = -8
   8: (79) r1 = *(u64 *)(r10 -16)
   9: (b7) r2 = 2
  10: (3d) if r2 >= r1 goto pc+3
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
  R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
  11: (65) if r1 s> 0x4 goto pc+2
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
  R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
  12: (0f) r0 += r1
  13: (72) *(u8 *)(r0 +0) = 0
  R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
  R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
  14: (b7) r0 = 0
  15: (95) exit

This leaves us with two options for fixing this: i) to invalidate
all prior learned information once we switch signed context, ii)
to track min/max signed and unsigned boundaries separately as
done in [0]. (Given latter introduces major changes throughout
the whole verifier, it's rather net-next material, thus this
patch follows option i), meaning we can derive bounds either
from only signed tests or only unsigned tests.) There is still the
case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
operations, meaning programs like the following where boundaries
on the reg get mixed in context later on when bounds are merged
on the dst reg must get rejected, too:

   0: (7a) *(u64 *)(r10 -8) = 0
   1: (bf) r2 = r10
   2: (07) r2 += -8
   3: (18) r1 = 0xffff89b2bf87ce00
   5: (85) call bpf_map_lookup_elem#1
   6: (15) if r0 == 0x0 goto pc+6
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
   7: (7a) *(u64 *)(r10 -16) = -8
   8: (79) r1 = *(u64 *)(r10 -16)
   9: (b7) r2 = 2
  10: (3d) if r2 >= r1 goto pc+2
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
  R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
  11: (b7) r7 = 1
  12: (65) if r7 s> 0x0 goto pc+2
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
  R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
  13: (b7) r0 = 0
  14: (95) exit

  from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
  R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
  15: (0f) r7 += r1
  16: (65) if r7 s> 0x4 goto pc+2
  R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
  R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
  17: (0f) r0 += r7
  18: (72) *(u8 *)(r0 +0) = 0
  R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
  R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
  19: (b7) r0 = 0
  20: (95) exit

Meaning, in adjust_reg_min_max_vals() we must also reset range
values on the dst when src/dst registers have mixed signed/
unsigned derived min/max value bounds with one unbounded value
as otherwise they can be added together deducing false boundaries.
Once both boundaries are established from either ALU ops or
compare operations w/o mixing signed/unsigned insns, then they
can safely be added to other regs also having both boundaries
established. Adding regs with one unbounded side to a map value
where the bounded side has been learned w/o mixing ops is
possible, but the resulting map value won't recover from that,
meaning such op is considered invalid on the time of actual
access. Invalid bounds are set on the dst reg in case i) src reg,
or ii) in case dst reg already had them. The only way to recover
would be to perform i) ALU ops but only 'add' is allowed on map
value types or ii) comparisons, but these are disallowed on
pointers in case they span a range. This is fine as only BPF_JEQ
and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
which potentially turn them into PTR_TO_MAP_VALUE type depending
on the branch, so only here min/max value cannot be invalidated
for them.

In terms of state pruning, value_from_signed is considered
as well in states_equal() when dealing with adjusted map values.
With regards to breaking existing programs, there is a small
risk, but use-cases are rather quite narrow where this could
occur and mixing compares probably unlikely.

Joint work with Josef and Edward.

  [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html

Fixes: 48461135 ("bpf: allow access into map value arrays")
Reported-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cabc5b1

20 7月, 2017 4 次提交

nvme: fix byte swapping in the streams code · dc1a0afb

由 Christoph Hellwig 提交于 7月 14, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc1a0afb

dma-coherent: introduce interface for default DMA pool · 43fc509c

由 Vladimir Murzin 提交于 7月 20, 2017

Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested [2] to
split the per-device vs. global pool interfaces, so allocation/release
from default DMA pool is driven by dma ops implementation.

This patch implements Robin's idea and provide interface to
allocate/release/mmap the default (aka global) DMA pool.

To make it clear that existing *_from_coherent routines work on
per-device pool rename them to *_from_dev_coherent.

[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431

Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Suggested-by: NRobin Murphy <robin.murphy@arm.com>
Tested-by: NAndras Szemzo <sza@esh.hu>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

43fc509c

trace: fix the errors caused by incompatible type of RCU variables · f86f4180

由 Chunyan Zhang 提交于 6月 07, 2017

The variables which are processed by RCU functions should be annotated
as RCU, otherwise sparse will report the errors like below:

"error: incompatible types in comparison expression (different
address spaces)"

Link: http://lkml.kernel.org/r/1496823171-7758-1-git-send-email-zhang.chunyan@linaro.orgSigned-off-by: NChunyan Zhang <zhang.chunyan@linaro.org>
[ Updated to not be 100% 80 column strict ]
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

f86f4180

llist: clang: introduce member_address_is_nonnull() · beaec533

由 Alexander Potapenko 提交于 7月 19, 2017

Currently llist_for_each_entry() and llist_for_each_entry_safe() iterate
until &pos->member != NULL.  But when building the kernel with Clang,
the compiler assumes &pos->member cannot be NULL if the member's offset
is greater than 0 (which would be equivalent to the object being
non-contiguous in memory).  Therefore the loop condition is always true,
and the loops become infinite.

To work around this, introduce the member_address_is_nonnull() macro,
which casts object pointer to uintptr_t, thus letting the member pointer
to be NULL.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Tested-by: NSodagudi Prasad <psodagud@codeaurora.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

beaec533

18 7月, 2017 4 次提交

include: usb: audio: specify exact endiannes of descriptors · 8bd226f9

由 Ruslan Bilovol 提交于 6月 25, 2017

USB spec says that multiple byte fields are stored in
little-endian order (see chapter 8.1 of USB2.0 spec and
chapter 7.1 of USB3.0 spec), thus mark such fields as LE
for UAC1 and UAC2 headers
Signed-off-by: NRuslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: NFelipe Balbi <felipe.balbi@linux.intel.com>

8bd226f9

{net, IB}/mlx4: Remove gfp flags argument · 8900b894

由 Leon Romanovsky 提交于 5月 23, 2017

The caller to the driver marks GFP_NOIO allocations with help
of memalloc_noio-* calls now. This makes redundant to pass down
to the driver gfp flags, which can be GFP_KERNEL only.

The patch removes the gfp flags argument and updates all driver paths.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8900b894

bpf: check NULL for sk_to_full_sk() return value · df39a9f1

由 WANG Cong 提交于 7月 17, 2017

When req->rsk_listener is NULL, sk_to_full_sk() returns
NULL too, so we have to check its return value against
NULL here.

Fixes: 40304b2a ("bpf: BPF support for sock_ops")
Reported-by: NDavid Ahern <dsahern@gmail.com>
Tested-by: NDavid Ahern <dsahern@gmail.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df39a9f1

jhash: fix -Wimplicit-fallthrough warnings · 13c401f3

由 Jakub Kicinski 提交于 7月 14, 2017

GCC 7 added a new -Wimplicit-fallthrough warning.  It's only enabled
with W=1, but since linux/jhash.h is included in over hundred places
(including other global headers) it seems worthwhile fixing this
warning.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13c401f3

17 7月, 2017 3 次提交

netfilter: remove old pre-netns era hook api · cf56c2f8

由 Florian Westphal 提交于 7月 06, 2017

no more users in the tree, remove this.

The old api is racy wrt. module removal, all users have been converted
to the netns-aware api.

The old api pretended we still have global hooks but that has not been
true for a long time.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cf56c2f8

mmc: host: omap_hsmmc: remove unused platform callbacks · 36acbd9e

由 Faiz Abbas 提交于 7月 14, 2017

Remove unused callbacks in the omap_hsmmc_platform_data structure
Signed-off-by: NFaiz Abbas <faiz_abbas@ti.com>
Acked-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>

36acbd9e

libceph: fix old style declaration warnings · e67ae2b7

由 Arnd Bergmann 提交于 7月 10, 2017

The new macros don't follow the usual style for declarations,
which we get a warning for with 'make W=1':

In file included from fs/ceph/mds_client.c:16:0:
include/linux/ceph/ceph_features.h:74:1: error: 'static' is not at beginning of declaration [-Werror=old-style-declaration]

This moves the 'static' keyword to the front of the
declaration.

Fixes: f179d3ba ("libceph: new features macros")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e67ae2b7

15 7月, 2017 3 次提交

replace incorrect strscpy use in FORTIFY_SOURCE · 077d2ba5

由 Daniel Micay 提交于 7月 14, 2017

Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum
possible size of the outermost object, but strscpy defines the count
parameter as the exact buffer size, so this could copy past the end of
the source.  This would still be wrong with the planned usage of
__builtin_object_size(p, 1) for intra-object overflow checks since it's
the maximum possible size of the specified object with no guarantee of
it being that large.

Reuse of the fortified functions like this currently makes the runtime
error reporting less precise but that can be improved later on.

Noticed by Dave Jones and KASAN.
Signed-off-by: NDaniel Micay <danielmicay@gmail.com>
Acked-by: NKees Cook <keescook@chromium.org>
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

077d2ba5

fault-inject: parse as natural 1-based value for fail-nth write interface · 9049f2f6

由 Akinobu Mita 提交于 7月 14, 2017

The value written to fail-nth file is parsed as 0-based. Parsing as
one-based is more natural to understand and it enables to cancel the
previous setup by simply writing '0'.

This change also converts task->fail_nth from signed to unsigned int.

Link: http://lkml.kernel.org/r/1491490561-10485-3-git-send-email-akinobu.mita@gmail.comSigned-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9049f2f6

dma-buf/fence: Avoid use of uninitialised timestamp · 76250f2b

由 Chris Wilson 提交于 2月 14, 2017

[  236.821534] WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff8802538683d0)
[  236.828642] 420000001e7f0000000000000000000000080000000000000000000000000000
[  236.839543]  i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
[  236.850420]                                  ^
[  236.854123] RIP: 0010:[<ffffffff81396f07>]  [<ffffffff81396f07>] fence_signal+0x17/0xd0
[  236.861313] RSP: 0018:ffff88024acd7ba0  EFLAGS: 00010282
[  236.865027] RAX: ffffffff812f6a90 RBX: ffff8802527ca800 RCX: ffff880252cb30e0
[  236.868801] RDX: ffff88024ac5d918 RSI: ffff880252f780e0 RDI: ffff880253868380
[  236.872579] RBP: ffff88024acd7bc0 R08: ffff88024acd7be0 R09: 0000000000000000
[  236.876407] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880253868380
[  236.880185] R13: ffff8802538684d0 R14: ffff880253868380 R15: ffff88024cd48e00
[  236.883983] FS:  00007f1646d1a740(0000) GS:ffff88025d000000(0000) knlGS:0000000000000000
[  236.890959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  236.894702] CR2: ffff880251360318 CR3: 000000024ad21000 CR4: 00000000001406f0
[  236.898481]  [<ffffffff8130d1ad>] i915_gem_request_retire+0x1cd/0x230
[  236.902439]  [<ffffffff8130e2b3>] i915_gem_request_alloc+0xa3/0x2f0
[  236.906435]  [<ffffffff812fb1bd>] i915_gem_do_execbuffer.isra.41+0xb6d/0x18b0
[  236.910434]  [<ffffffff812fc265>] i915_gem_execbuffer2+0x95/0x1e0
[  236.914390]  [<ffffffff812ad625>] drm_ioctl+0x1e5/0x460
[  236.918275]  [<ffffffff8110d4cf>] do_vfs_ioctl+0x8f/0x5c0
[  236.922168]  [<ffffffff8110da3c>] SyS_ioctl+0x3c/0x70
[  236.926090]  [<ffffffff814b7a5f>] entry_SYSCALL_64_fastpath+0x17/0x93
[  236.930045]  [<ffffffffffffffff>] 0xffffffffffffffff

We only set the timestamp before we mark the fence as signaled. It is
done before to avoid observers having a window in which they may see the
fence as complete but no timestamp. Having it does incur a potential for
the timestamp to be written twice, and even for it to be corrupted if
the u64 write is not atomic. Instead use a new bit to record the
presence of the timestamp, and teach the readers to wait until it is set
if the fence is complete. There still remains a race where the timestamp
for the signaled fence may be shown before the fence is reported as
signaled, but that's a pre-existing error.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Reported-by: NRafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214124001.1930-1-chris@chris-wilson.co.uk

76250f2b

14 7月, 2017 12 次提交

cdc_ncm: Set NTB format again after altsetting switch for Huawei devices · 2b02c20c

由 Enrico Mioso 提交于 7月 11, 2017

Some firmwares in Huawei E3372H devices have been observed to switch back
to NTB 32-bit format after altsetting switch.
This patch implements a driver flag to check for the device settings and
set NTB format to 16-bit again if needed.
The flag has been activated for devices controlled by the huawei_cdc_ncm.c
driver.

V1->V2:
- fixed broken error checks
- some corrections to the commit message
V2->V3:
- variable name changes, to clarify what's happening
- check (and possibly set) the NTB format later in the common bind code path
Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
Reported-and-tested-by: NChristian Panton <christian@panton.org>
Reviewed-by: NBjørn Mork <bjorn@mork.no>
CC: Bjørn Mork <bjorn@mork.no>
CC: Christian Panton <christian@panton.org>
CC: linux-usb@vger.kernel.org
CC: netdev@vger.kernel.org
CC: Oliver Neukum <oliver@neukum.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b02c20c

NFS: Don't run wake_up_bit() when nobody is waiting... · b4f937cf

由 Trond Myklebust 提交于 7月 11, 2017

"perf lock" shows fairly heavy contention for the bit waitqueue locks
when doing an I/O heavy workload.
Use a bit to tell whether or not there has been contention for a lock
so that we can optimise away the bit waitqueue options in those cases.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b4f937cf

NFS: Don't run wake_up_bit() when nobody is waiting... · 301bfa48

由 Trond Myklebust 提交于 7月 11, 2017

"perf lock" shows fairly heavy contention for the bit waitqueue locks
when doing an I/O heavy workload.
Use a bit to tell whether or not there has been contention for a lock
so that we can optimise away the bit waitqueue options in those cases.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

301bfa48

nfs4: add NFSv4 LOOKUPP handlers · 5b5faaf6

由 Jeff Layton 提交于 6月 29, 2017

This will be needed in order to implement the get_parent export op
for nfsd.
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5b5faaf6

nfs: add a nfs_ilookup helper · f174ff7a

由 Peng Tao 提交于 6月 29, 2017

This helper will allow to find an existing NFS inode by the file handle
and fattr.
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
[hch: split from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f174ff7a

NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration · 8dcbec6d

由 Chuck Lever 提交于 6月 08, 2017

Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.

The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).

Normally, when CONFIRMED_R is set, a client purges the lease and
creates a new one. However, that throws away the entire benefit of
Transparent State Migration.

Therefore, the client must not purge that lease when it is possible
that Transparent State Migration has occurred.
Reported-by: NXuan Qi <xuan.qi@oracle.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NXuan Qi <xuan.qi@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8dcbec6d

NFS: Ensure we commit after writeback is complete · 919e3bd9

由 Trond Myklebust 提交于 6月 20, 2017

If the page cache is being flushed, then we want to ensure that we
do start a commit once the pages are done being flushed.
If we just wait until all I/O is done to that file, we can end up
livelocking until the balance_dirty_pages() mechanism puts its
foot down and forces I/O to stop.
So instead we do more or less the same thing that O_DIRECT does,
and set up a counter to tell us when the flush is done,
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

919e3bd9

NFS: Remove unused fields in the page I/O structures · b5973a8c

由 Trond Myklebust 提交于 6月 20, 2017

Remove the 'layout_private' fields that were only used by the pNFS OSD
layout driver.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b5973a8c

NFS: nfs_rename() - revalidate directories on -ERESTARTSYS · 818a8dbe

由 Benjamin Coddington 提交于 6月 16, 2017

An interrupted rename will leave the old dentry behind if the rename
succeeds.  Fix this by forcing a lookup the next time through
->d_revalidate.

A previous attempt at solving this problem took the approach to complete
the work of the rename asynchronously, however that approach was wrong
since it would allow the d_move() to occur after the directory's i_mutex
had been dropped by the original process.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

818a8dbe

NFS: convert flags to bool · a7a3b1e9

由 Benjamin Coddington 提交于 6月 20, 2017

NFS uses some int, and unsigned int :1, and bool as flags in structs and
args.  Assert the preference for uniformly replacing these with the bool
type.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a7a3b1e9

C
sunrpc: mark all struct svc_version instances as const · aa8217d5
由 Christoph Hellwig 提交于 5月 12, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
aa8217d5

sunrpc: mark all struct svc_procinfo instances as const · b9c744c1

由 Christoph Hellwig 提交于 5月 12, 2017

struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b9c744c1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功