1. 20 10月, 2017 5 次提交
  2. 18 10月, 2017 1 次提交
    • D
      KEYS: Fix race between updating and finding a negative key · 363b02da
      David Howells 提交于
      Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection
      error into one field such that:
      
       (1) The instantiation state can be modified/read atomically.
      
       (2) The error can be accessed atomically with the state.
      
       (3) The error isn't stored unioned with the payload pointers.
      
      This deals with the problem that the state is spread over three different
      objects (two bits and a separate variable) and reading or updating them
      atomically isn't practical, given that not only can uninstantiated keys
      change into instantiated or rejected keys, but rejected keys can also turn
      into instantiated keys - and someone accessing the key might not be using
      any locking.
      
      The main side effect of this problem is that what was held in the payload
      may change, depending on the state.  For instance, you might observe the
      key to be in the rejected state.  You then read the cached error, but if
      the key semaphore wasn't locked, the key might've become instantiated
      between the two reads - and you might now have something in hand that isn't
      actually an error code.
      
      The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error
      code if the key is negatively instantiated.  The key_is_instantiated()
      function is replaced with key_is_positive() to avoid confusion as negative
      keys are also 'instantiated'.
      
      Additionally, barriering is included:
      
       (1) Order payload-set before state-set during instantiation.
      
       (2) Order state-read before payload-read when using the key.
      
      Further separate barriering is necessary if RCU is being used to access the
      payload content after reading the payload pointers.
      
      Fixes: 146aa8b1 ("KEYS: Merge the type-specific data with the payload data")
      Cc: stable@vger.kernel.org # v4.4+
      Reported-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      363b02da
  3. 17 10月, 2017 1 次提交
    • C
      tun: call dev_get_valid_name() before register_netdevice() · 0ad646c8
      Cong Wang 提交于
      register_netdevice() could fail early when we have an invalid
      dev name, in which case ->ndo_uninit() is not called. For tun
      device, this is a problem because a timer etc. are already
      initialized and it expects ->ndo_uninit() to clean them up.
      
      We could move these initializations into a ->ndo_init() so
      that register_netdevice() knows better, however this is still
      complicated due to the logic in tun_detach().
      
      Therefore, I choose to just call dev_get_valid_name() before
      register_netdevice(), which is quicker and much easier to audit.
      And for this specific case, it is already enough.
      
      Fixes: 96442e42 ("tuntap: choose the txq based on rxq")
      Reported-by: NDmitry Alexeev <avekceeb@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ad646c8
  4. 14 10月, 2017 4 次提交
  5. 13 10月, 2017 3 次提交
    • D
      genirq: generic chip: remove irq_gc_mask_disable_reg_and_ack() · 0d08af35
      Doug Berger 提交于
      Any usage of the irq_gc_mask_disable_reg_and_ack() function has
      been replaced with the desired functionality.
      
      The incorrect and ambiguously named function is removed here to
      prevent accidental misuse.
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      0d08af35
    • D
      genirq: generic chip: Add irq_gc_mask_disable_and_ack_set() · 20608924
      Doug Berger 提交于
      The irq_gc_mask_disable_reg_and_ack() function name implies that it
      provides the combined functions of irq_gc_mask_disable_reg() and
      irq_gc_ack().  However, the implementation does not actually do
      that since it writes the mask instead of the disable register. It
      also does not maintain the mask cache which makes it inappropriate
      to use with other masking functions.
      
      In addition, commit 659fb32d ("genirq: replace irq_gc_ack() with
      {set,clr}_bit variants (fwd)") effectively renamed irq_gc_ack() to
      irq_gc_ack_set_bit() so this function probably should have also been
      renamed at that time.
      
      The generic chip code currently provides three functions for use
      with the irq_mask member of the irq_chip structure and two functions
      for use with the irq_ack member of the irq_chip structure. These
      functions could be combined into six functions for use with the
      irq_mask_ack member of the irq_chip structure.  However, since only
      one of the combinations is currently used, only the function
      irq_gc_mask_disable_and_ack_set() is added by this commit.
      
      The '_reg' and '_bit' portions of the base function name were left
      out of the new combined function name in an attempt to keep the
      function name length manageable with the 80 character source code
      line length while still allowing the distinct aspects of each
      combination to be captured by the name.
      
      If other combinations are desired in the future please add them to
      the irq generic chip library at that time.
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      20608924
    • S
      irqchip/gic-v3-its: Add missing changes to support 52bit physical address · 30ae9610
      Shanker Donthineni 提交于
      The current ITS driver works fine as long as normal memory and GICR
      regions are located within the lower 48bit (>=0 && <2^48) physical
      address space. Some of the registers GICR_PEND/PROP, GICR_VPEND/VPROP
      and GITS_CBASER are handled properly but not all when configuring
      the hardware with 52bit physical address.
      
      This patch does the following changes to support 52bit PA.
        -Handle 52bit PA in GITS_BASERn.
        -Fix ITT_addr width to 52bits, bits[51:8].
        -Fix RDbase width to 52bits, bits[51:16].
        -Fix VPT_addr width to 52bits, bits[51:16].
      
      Definition of the GITS_BASERn register when ITS PageSize is 64KB:
        -Bits[47:16] of the register provide bits[47:16] of the table PA.
        -Bits[15:12] of the register provide bits[51:48] of the table PA.
        -Bits[15:00] of the base physical address are 0.
      Signed-off-by: NShanker Donthineni <shankerd@codeaurora.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      30ae9610
  6. 12 10月, 2017 1 次提交
    • J
      bus: mbus: fix window size calculation for 4GB windows · 2bbbd963
      Jan Luebbe 提交于
      At least the Armada XP SoC supports 4GB on a single DRAM window. Because
      the size register values contain the actual size - 1, the MSB is set in
      that case. For example, the SDRAM window's control register's value is
      0xffffffe1 for 4GB (bits 31 to 24 contain the size).
      
      The MBUS driver reads back each window's size from registers and
      calculates the actual size as (control_reg | ~DDR_SIZE_MASK) + 1, which
      overflows for 32 bit values, resulting in other miscalculations further
      on (a bad RAM window for the CESA crypto engine calculated by
      mvebu_mbus_setup_cpu_target_nooverlap() in my case).
      
      This patch changes the type in 'struct mbus_dram_window' from u32 to
      u64, which allows us to keep using the same register calculation code in
      most MBUS-using drivers (which calculate ->size - 1 again).
      
      Fixes: fddddb52 ("bus: introduce an Marvell EBU MBus driver")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Luebbe <jlu@pengutronix.de>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      2bbbd963
  7. 10 10月, 2017 1 次提交
    • P
      sched/core: Fix wake_affine() performance regression · d153b153
      Peter Zijlstra 提交于
      Eric reported a sysbench regression against commit:
      
        3fed382b ("sched/numa: Implement NUMA node level wake_affine()")
      
      Similarly, Rik was looking at the NAS-lu.C benchmark, which regressed
      against his v3.10 enterprise kernel.
      
      PRE (current tip/master):
      
       ivb-ep sysbench:
      
         2: [30 secs]     transactions:                        64110  (2136.94 per sec.)
         5: [30 secs]     transactions:                        143644 (4787.99 per sec.)
        10: [30 secs]     transactions:                        274298 (9142.93 per sec.)
        20: [30 secs]     transactions:                        418683 (13955.45 per sec.)
        40: [30 secs]     transactions:                        320731 (10690.15 per sec.)
        80: [30 secs]     transactions:                        355096 (11834.28 per sec.)
      
       hsw-ex NAS:
      
       OMP_PROC_BIND/lu.C.x_threads_144_run_1.log: Time in seconds =                    18.01
       OMP_PROC_BIND/lu.C.x_threads_144_run_2.log: Time in seconds =                    17.89
       OMP_PROC_BIND/lu.C.x_threads_144_run_3.log: Time in seconds =                    17.93
       lu.C.x_threads_144_run_1.log: Time in seconds =                   434.68
       lu.C.x_threads_144_run_2.log: Time in seconds =                   405.36
       lu.C.x_threads_144_run_3.log: Time in seconds =                   433.83
      
      POST (+patch):
      
       ivb-ep sysbench:
      
         2: [30 secs]     transactions:                        64494  (2149.75 per sec.)
         5: [30 secs]     transactions:                        145114 (4836.99 per sec.)
        10: [30 secs]     transactions:                        278311 (9276.69 per sec.)
        20: [30 secs]     transactions:                        437169 (14571.60 per sec.)
        40: [30 secs]     transactions:                        669837 (22326.73 per sec.)
        80: [30 secs]     transactions:                        631739 (21055.88 per sec.)
      
       hsw-ex NAS:
      
       lu.C.x_threads_144_run_1.log: Time in seconds =                    23.36
       lu.C.x_threads_144_run_2.log: Time in seconds =                    22.96
       lu.C.x_threads_144_run_3.log: Time in seconds =                    22.52
      
      This patch takes out all the shiny wake_affine() stuff and goes back to
      utter basics. Between the two CPUs involved with the wakeup (the CPU
      doing the wakeup and the CPU we ran on previously) pick the CPU we can
      run on _now_.
      
      This restores much of the regressions against the older kernels,
      but leaves some ground in the overloaded case. The default-enabled
      WA_WEIGHT (which will be introduced in the next patch) is an attempt
      to address the overloaded situation.
      Reported-by: NEric Farman <farman@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jinpuwang@gmail.com
      Cc: vcaputo@pengaru.com
      Fixes: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d153b153
  8. 09 10月, 2017 1 次提交
    • S
      netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' · 98589a09
      Shmulik Ladkani 提交于
      Commit 2c16d603 ("netfilter: xt_bpf: support ebpf") introduced
      support for attaching an eBPF object by an fd, with the
      'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
      IPT_SO_SET_REPLACE call.
      
      However this breaks subsequent iptables calls:
      
       # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
       # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
       iptables: Invalid argument. Run `dmesg' for more information.
      
      That's because iptables works by loading existing rules using
      IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
      the replacement set.
      
      However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
      (from the initial "iptables -m bpf" invocation) - so when 2nd invocation
      occurs, userspace passes a bogus fd number, which leads to
      'bpf_mt_check_v1' to fail.
      
      One suggested solution [1] was to hack iptables userspace, to perform a
      "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
      process-local fd per every 'xt_bpf_info_v1' entry seen.
      
      However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
      depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
      
      This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
      '.fd' and instead perform an in-kernel lookup for the bpf object given
      the provided '.path'.
      
      It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
      XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
      expected to provide the path of the pinned object.
      
      Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.
      
      References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
                  [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2Reported-by: NRafael Buchbinder <rafi@rbk.ms>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      98589a09
  9. 04 10月, 2017 11 次提交
  10. 29 9月, 2017 7 次提交
  11. 28 9月, 2017 3 次提交
    • K
      timer: Prepare to change timer callback argument type · 686fef92
      Kees Cook 提交于
      Modern kernel callback systems pass the structure associated with a
      given callback to the callback function. The timer callback remains one
      of the legacy cases where an arbitrary unsigned long argument continues
      to be passed as the callback argument. This has several problems:
      
      - This bloats the timer_list structure with a normally redundant
        .data field.
      
      - No type checking is being performed, forcing callbacks to do
        explicit type casts of the unsigned long argument into the object
        that was passed, rather than using container_of(), as done in most
        of the other callback infrastructure.
      
      - Neighboring buffer overflows can overwrite both the .function and
        the .data field, providing attackers with a way to elevate from a buffer
        overflow into a simplistic ROP-like mechanism that allows calling
        arbitrary functions with a controlled first argument.
      
      - For future Control Flow Integrity work, this creates a unique function
        prototype for timer callbacks, instead of allowing them to continue to
        be clustered with other void functions that take a single unsigned long
        argument.
      
      This adds a new timer initialization API, which will ultimately replace
      the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family,
      named timer_setup() (to mirror hrtimer_setup(), making instances of its
      use much easier to grep for).
      
      In order to support the migration of existing timers into the new
      callback arguments, timer_setup() casts its arguments to the existing
      legacy types, and explicitly passes the timer pointer as the legacy
      data argument. Once all setup_*timer() callers have been replaced with
      timer_setup(), the casts can be removed, and the data argument can be
      dropped with the timer expiration code changed to just pass the timer
      to the callback directly.
      
      Since the regular pattern of using container_of() during local variable
      declaration repeats the need for the variable type declaration
      to be included, this adds a helper modeled after other from_*()
      helpers that wrap container_of(), named from_timer(). This helper uses
      typeof(*variable), removing the type redundancy and minimizing the need
      for line wraps in forthcoming conversions from "unsigned data long" to
      "struct timer_list *" in the timer callbacks:
      
      -void callback(unsigned long data)
      +void callback(struct timer_list *t)
      {
      -   struct some_data_structure *local = (struct some_data_structure *)data;
      +   struct some_data_structure *local = from_timer(local, t, timer);
      
      Finally, in order to support the handful of timer users that perform
      open-coded assignments of the .function (and .data) fields, provide
      cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used
      temporarily. Once conversion has been completed, these can be globally
      trivially removed.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
      686fef92
    • R
      net/mlx5: Check device capability for maximum flow counters · 16f1c5bb
      Raed Salem 提交于
      Added check for the maximal number of flow counters attached
      to rule (FTE).
      
      Fixes: bd5251db ('net/mlx5_core: Introduce flow steering destination of type counter')
      Signed-off-by: NRaed Salem <raeds@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      16f1c5bb
    • I
      net/mlx5: Fix FPGA capability location · 99d3cd27
      Inbar Karmy 提交于
      Currently, FPGA capability is located in (mdev)->caps.hca_cur,
      change the location to be (mdev)->caps.fpga,
      since hca_cur is reserved for HCA device capabilities.
      
      Fixes: e29341fb ("net/mlx5: FPGA, Add basic support for Innova")
      Signed-off-by: NInbar Karmy <inbark@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      99d3cd27
  12. 27 9月, 2017 1 次提交
  13. 26 9月, 2017 1 次提交
    • P
      smp/hotplug: Hotplug state fail injection · 1db49484
      Peter Zijlstra 提交于
      Add a sysfs file to one-time fail a specific state. This can be used
      to test the state rollback code paths.
      
      Something like this (hotplug-up.sh):
      
        #!/bin/bash
      
        echo 0 > /debug/sched_debug
        echo 1 > /debug/tracing/events/cpuhp/enable
      
        ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1`
        STATES=${1:-$ALL_STATES}
      
        for state in $STATES
        do
      	  echo 0 > /sys/devices/system/cpu/cpu1/online
      	  echo 0 > /debug/tracing/trace
      	  echo Fail state: $state
      	  echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail
      	  cat /sys/devices/system/cpu/cpu1/hotplug/fail
      	  echo 1 > /sys/devices/system/cpu/cpu1/online
      
      	  cat /debug/tracing/trace > hotfail-${state}.trace
      
      	  sleep 1
        done
      
      Can be used to test for all possible rollback (barring multi-instance)
      scenarios on CPU-up, CPU-down is a trivial modification of the above.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org
      
      1db49484