1. 17 12月, 2018 1 次提交
    • L
      mmc: sdhci: imx: Use the slot GPIO descriptor · 74ff81e1
      Linus Walleij 提交于
      Simplify things by making the i.MX SDHCI driver just use
      slot GPIO with descriptors instead of passing around the global
      GPIO numbers that we want to get rid of.
      
      As it turns out, just one single board is using the platform
      data to pass in GPIOs numbers for CD and WP, so we augment this
      to use a machine descriptor table instead.
      
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Sascha Hauer <s.hauer@pengutronix.de>
      Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
      Cc: Fabio Estevam <fabio.estevam@nxp.com>
      Cc: NXP Linux Team <linux-imx@nxp.com>
      Cc: Bartosz Golaszewski <brgl@bgdev.pl>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: NDong Aisheng <aisheng.dong@nxp.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      74ff81e1
  2. 15 12月, 2018 2 次提交
  3. 09 12月, 2018 1 次提交
    • D
      Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask" · 356ff8a9
      David Rientjes 提交于
      This reverts commit 89c83fb5.
      
      This should have been done as part of 2f0799a0 ("mm, thp: restore
      node-local hugepage allocations").  The movement of the thp allocation
      policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
      intended to only set __GFP_THISNODE for mempolicies that are not
      MPOL_BIND whereas the revert could set this regardless of mempolicy.
      
      While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
      and alloc_pages_vma() was racy, that has since been removed since the
      revert.  What is left is the possibility to use __GFP_THISNODE in
      policy_node() when it is unexpected because the special handling for
      hugepages in alloc_pages_vma()  was removed as part of the consolidation.
      
      Secondly, prior to 89c83fb5, alloc_pages_vma() implemented a somewhat
      different policy for hugepage allocations, which were allocated through
      alloc_hugepage_vma().  For hugepage allocations, if the allocating
      process's node is in the set of allowed nodes, allocate with
      __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
      __GFP_THISNODE instead).  This was changed for shmem_alloc_hugepage() to
      allow fallback to other nodes in 89c83fb5 as it did for new_page() in
      mm/mempolicy.c which is functionally different behavior and removes the
      requirement to only allocate hugepages locally.
      
      So this commit does a full revert of 89c83fb5 instead of the partial
      revert that was done in 2f0799a0.  The result is the same thp
      allocation policy for 4.20 that was in 4.19.
      
      Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
      Fixes: 2f0799a0 ("mm, thp: restore node-local hugepage allocations")
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      356ff8a9
  4. 06 12月, 2018 3 次提交
  5. 05 12月, 2018 2 次提交
    • J
      USB: serial: console: fix reported terminal settings · f51ccf46
      Johan Hovold 提交于
      The USB-serial console implementation has never reported the actual
      terminal settings used. Despite storing the corresponding cflags in its
      struct console, these were never honoured on later tty open() where the
      tty termios would be left initialised to the driver defaults.
      
      Unlike the serial console implementation, the USB-serial code calls
      subdriver open() already at console setup. While calling set_termios()
      and write() before open() looks like it could work for some USB-serial
      drivers, others definitely do not expect this, so modelling this after
      serial core is going to be intrusive, if at all possible.
      
      Instead, use a (renamed) tty helper to save the termios data used at
      console setup so that the tty termios reflects the actual terminal
      settings after a subsequent tty open().
      
      Note that the calls to tty_init_termios() (tty_driver_install()) and
      tty_save_termios() are serialised using the disconnect mutex.
      
      This specifically fixes a regression that was triggered by a recent
      change adding software flow control to the pl2303 driver: a getty trying
      to disable flow control while leaving the baud rate unchanged would now
      also set the baud rate to the driver default (prior to the flow-control
      change this had been a noop).
      
      Fixes: 7041d9c3 ("USB: serial: pl2303: add support for tx xon/xoff flow control")
      Cc: stable <stable@vger.kernel.org>	# 4.18
      Cc: Florian Zumbiehl <florz@florz.de>
      Reported-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
      Tested-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      f51ccf46
    • M
      dax: Fix unlock mismatch with updated API · 27359fd6
      Matthew Wilcox 提交于
      Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
      store a replacement entry in the Xarray at the given xas-index with the
      DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
      value of the entry relative to the current Xarray state to be specified.
      
      In most contexts dax_unlock_entry() is operating in the same scope as
      the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
      case the implementation needs to recall the original entry. In the case
      where the original entry is a 'pmd' entry it is possible that the pfn
      performed to do the lookup is misaligned to the value retrieved in the
      Xarray.
      
      Change the api to return the unlock cookie from dax_lock_page() and pass
      it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
      assuming that the page was PMD-aligned if the entry was a PMD entry with
      signatures like:
      
       WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
       RIP: 0010:dax_insert_entry+0x2b2/0x2d0
       [..]
       Call Trace:
        dax_iomap_pte_fault.isra.41+0x791/0xde0
        ext4_dax_huge_fault+0x16f/0x1f0
        ? up_read+0x1c/0xa0
        __do_fault+0x1f/0x160
        __handle_mm_fault+0x1033/0x1490
        handle_mm_fault+0x18b/0x3d0
      
      Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
      Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      27359fd6
  6. 03 12月, 2018 1 次提交
    • D
      Drivers: hv: vmbus: Offload the handling of channels to two workqueues · 37c2578c
      Dexuan Cui 提交于
      vmbus_process_offer() mustn't call channel->sc_creation_callback()
      directly for sub-channels, because sc_creation_callback() ->
      vmbus_open() may never get the host's response to the
      OPEN_CHANNEL message (the host may rescind a channel at any time,
      e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind()
      may not wake up the vmbus_open() as it's blocked due to a non-zero
      vmbus_connection.offer_in_progress, and finally we have a deadlock.
      
      The above is also true for primary channels, if the related device
      drivers use sync probing mode by default.
      
      And, usually the handling of primary channels and sub-channels can
      depend on each other, so we should offload them to different
      workqueues to avoid possible deadlock, e.g. in sync-probing mode,
      NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() ->
      rtnl_lock(), and causes deadlock: the former gets the rtnl_lock
      and waits for all the sub-channels to appear, but the latter
      can't get the rtnl_lock and this blocks the handling of sub-channels.
      
      The patch can fix the multiple-NIC deadlock described above for
      v3.x kernels (e.g. RHEL 7.x) which don't support async-probing
      of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing
      but don't enable async-probing for Hyper-V drivers (yet).
      
      The patch can also fix the hang issue in sub-channel's handling described
      above for all versions of kernels, including v4.19 and v4.20-rc4.
      
      So actually the patch should be applied to all the existing kernels,
      not only the kernels that have 8195b139.
      
      Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
      Cc: stable@vger.kernel.org
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37c2578c
  7. 02 12月, 2018 1 次提交
  8. 01 12月, 2018 2 次提交
    • D
      bpf: fix pointer offsets in context for 32 bit · b7df9ada
      Daniel Borkmann 提交于
      Currently, pointer offsets in three BPF context structures are
      broken in two scenarios: i) 32 bit compiled applications running
      on 64 bit kernels, and ii) LLVM compiled BPF programs running
      on 32 bit kernels. The latter is due to BPF target machine being
      strictly 64 bit. So in each of the cases the offsets will mismatch
      in verifier when checking / rewriting context access. Fix this by
      providing a helper macro __bpf_md_ptr() that will enforce padding
      up to 64 bit and proper alignment, and for context access a macro
      bpf_ctx_range_ptr() which will cover full 64 bit member range on
      32 bit archs. For flow_keys, we additionally need to force the
      size check to sizeof(__u64) as with other pointer types.
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Fixes: 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
      Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
      Reported-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b7df9ada
    • J
      psi: make disabling/enabling easier for vendor kernels · e0c27447
      Johannes Weiner 提交于
      Mel Gorman reports a hackbench regression with psi that would prohibit
      shipping the suse kernel with it default-enabled, but he'd still like
      users to be able to opt in at little to no cost to others.
      
      With the current combination of CONFIG_PSI and the psi_disabled bool set
      from the commandline, this is a challenge.  Do the following things to
      make it easier:
      
      1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
         to enable CONFIG_PSI in their kernel but leave the feature disabled
         unless a user requests it at boot-time.
      
         To avoid double negatives, rename psi_disabled= to psi=.
      
      2. Make psi_disabled a static branch to eliminate any branch costs
         when the feature is disabled.
      
      In terms of numbers before and after this patch, Mel says:
      
      : The following is a comparision using CONFIG_PSI=n as a baseline against
      : your patch and a vanilla kernel
      :
      :                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
      :                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
      : Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
      : Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
      : Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
      : Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
      : Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
      : Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
      : Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
      : Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
      : Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
      :
      : It's showing that the vanilla kernel takes a hit (as the bisection
      : indicated it would) and that disabling PSI by default is reasonably
      : close in terms of performance for this particular workload on this
      : particular machine so;
      
      Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: NMel Gorman <mgorman@techsingularity.net>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0c27447
  9. 30 11月, 2018 3 次提交
  10. 28 11月, 2018 7 次提交
    • K
      fscache: Fix race in fscache_op_complete() due to split atomic_sub & read · 3f2b7b90
      kiran.modukuri 提交于
      The code in fscache_retrieval_complete is using atomic_sub followed by an
      atomic_read:
      
              atomic_sub(n_pages, &op->n_pages);
              if (atomic_read(&op->n_pages) <= 0)
                      fscache_op_complete(&op->op, true);
      
      This causes two threads doing a decrement of n_pages to race with each
      other seeing the op->refcount 0 at same time - and they end up calling
      fscache_op_complete() in both the threads leading to an assertion failure.
      
      Fix this by using atomic_sub_return_relaxed() instead of two calls.  Note
      that I'm using 'relaxed' rather than, say, 'release' as there aren't
      multiple variables that appear to need ordering across the release.
      
      The oops looks something like:
      
      FS-Cache: Assertion failed
      FS-Cache: 0 > 0 is false
      ...
      kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449!
      ...
      Workqueue: fscache_operation fscache_op_work_func [fscache]
      ...
      RIP: 0010:[<ffffffffc037eacd>] fscache_op_complete+0x10d/0x180 [fscache]
      ...
      Call Trace:
       [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles]
       [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache]
       [<ffffffff81096da0>] process_one_work+0x150/0x3f0
       [<ffffffff8109751a>] worker_thread+0x11a/0x470
       [<ffffffff81808e59>] ? __schedule+0x359/0x980
       [<ffffffff81097400>] ? rescuer_thread+0x310/0x310
       [<ffffffff8109cdd6>] kthread+0xd6/0xf0
       [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
       [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
       [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
      
      This seen this in 4.4.x kernels and the same bug affects fscache in latest
      upstreams kernels.
      
      Fixes: 1bb4b7f9 ("FS-Cache: The retrieval remaining-pages counter needs to be atomic_t")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3f2b7b90
    • T
      x86/speculation: Add prctl() control for indirect branch speculation · 9137bb27
      Thomas Gleixner 提交于
      Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
      PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
      indirect branch speculation via STIBP and IBPB.
      
      Invocations:
       Check indirect branch speculation status with
       - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
      
       Enable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
      
       Disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
      
       Force disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
      
      See Documentation/userspace-api/spec_ctrl.rst.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
      9137bb27
    • T
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · 46f7ecb1
      Thomas Gleixner 提交于
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de
      
      46f7ecb1
    • T
      x86/speculation: Rework SMT state change · a74cfffb
      Thomas Gleixner 提交于
      arch_smt_update() is only called when the sysfs SMT control knob is
      changed. This means that when SMT is enabled in the sysfs control knob the
      system is considered to have SMT active even if all siblings are offline.
      
      To allow finegrained control of the speculation mitigations, the actual SMT
      state is more interesting than the fact that siblings could be enabled.
      
      Rework the code, so arch_smt_update() is invoked from each individual CPU
      hotplug function, and simplify the update function while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
      
      a74cfffb
    • T
      sched/smt: Expose sched_smt_present static key · 321a874a
      Thomas Gleixner 提交于
      Make the scheduler's 'sched_smt_present' static key globaly available, so
      it can be used in the x86 speculation control code.
      
      Provide a query function and a stub for the CONFIG_SMP=n case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
      321a874a
    • S
      function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack · 39eb456d
      Steven Rostedt (VMware) 提交于
      Currently, the depth of the ret_stack is determined by curr_ret_stack index.
      The issue is that there's a race between setting of the curr_ret_stack and
      calling of the callback attached to the return of the function.
      
      Commit 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling
      trace return callback") moved the calling of the callback to after the
      setting of the curr_ret_stack, even stating that it was safe to do so, when
      in fact, it was the reason there was a barrier() there (yes, I should have
      commented that barrier()).
      
      Not only does the curr_ret_stack keep track of the current call graph depth,
      it also keeps the ret_stack content from being overwritten by new data.
      
      The function profiler, uses the "subtime" variable of ret_stack structure
      and by moving the curr_ret_stack, it allows for interrupts to use the same
      structure it was using, corrupting the data, and breaking the profiler.
      
      To fix this, there needs to be two variables to handle the call stack depth
      and the pointer to where the ret_stack is being used, as they need to change
      at two different locations.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      39eb456d
    • S
      function_graph: Make ftrace_push_return_trace() static · d125f3f8
      Steven Rostedt (VMware) 提交于
      As all architectures now call function_graph_enter() to do the entry work,
      no architecture should ever call ftrace_push_return_trace(). Make it static.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d125f3f8
  11. 27 11月, 2018 2 次提交
    • D
      bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr · e2c95a61
      Daniel Borkmann 提交于
      Make fetching of the BPF call address from ppc64 JIT generic. ppc64
      was using a slightly different variant rather than through the insns'
      imm field encoding as the target address would not fit into that space.
      Therefore, the target subprog number was encoded into the insns' offset
      and fetched through fp->aux->func[off]->bpf_func instead. Given there
      are other JITs with this issue and the mechanism of fetching the address
      is JIT-generic, move it into the core as a helper instead. On the JIT
      side, we get information on whether the retrieved address is a fixed
      one, that is, not changing through JIT passes, or a dynamic one. For
      the former, JITs can optimize their imm emission because this doesn't
      change jump offsets throughout JIT process.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NSandipan Das <sandipan@linux.ibm.com>
      Tested-by: NSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e2c95a61
    • S
      function_graph: Create function_graph_enter() to consolidate architecture code · 8114865f
      Steven Rostedt (VMware) 提交于
      Currently all the architectures do basically the same thing in preparing the
      function graph tracer on entry to a function. This code can be pulled into a
      generic location and then this will allow the function graph tracer to be
      fixed, as well as extended.
      
      Create a new function graph helper function_graph_enter() that will call the
      hook function (ftrace_graph_entry) and the shadow stack operation
      (ftrace_push_return_trace), and remove the need of the architecture code to
      manage the shadow stack.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      8114865f
  12. 26 11月, 2018 2 次提交
    • B
      gpio: davinci: restore a way to manually specify the GPIO base · 786a9ab1
      Bartosz Golaszewski 提交于
      Commit 587f7a69 ("gpio: davinci: Use dev name for label and
      automatic base selection") broke the network support in legacy boot
      mode for da850-evm since we can no longer request the MDIO clock GPIO.
      
      Other boards may be broken too, which I haven't tested.
      
      The problem is in the fact that most board files still use the legacy
      GPIO API where lines are requested by numbers rather than descriptors.
      
      While this should be fixed eventually, in order to unbreak the board
      for now - provide a way to manually specify the GPIO base in platform
      data.
      
      Fixes: 587f7a69 ("gpio: davinci: Use dev name for label and automatic base selection")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      786a9ab1
    • F
      netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too · 89259088
      Florian Westphal 提交于
      syzbot was able to trigger the WARN in cttimeout_default_get() by
      passing UDPLITE as l4protocol.  Alias UDPLITE to UDP, both use
      same timeout values.
      
      Furthermore, also fetch GRE timeouts.  GRE is a bit more complicated,
      as it still can be a module and its netns_proto_gre struct layout isn't
      visible outside of the gre module. Can't move timeouts around, it
      appears conntrack sysctl unregister assumes net_generic() returns
      nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
      
      A followup nf-next patch could make gre tracker be built-in as well
      if needed, its not that large.
      
      Last, make the WARN() mention the missing protocol value in case
      anything else is missing.
      
      Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
      Fixes: 8866df92 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      89259088
  13. 24 11月, 2018 1 次提交
    • W
      packet: copy user buffers before orphan or clone · 5cd8d46e
      Willem de Bruijn 提交于
      tpacket_snd sends packets with user pages linked into skb frags. It
      notifies that pages can be reused when the skb is released by setting
      skb->destructor to tpacket_destruct_skb.
      
      This can cause data corruption if the skb is orphaned (e.g., on
      transmit through veth) or cloned (e.g., on mirror to another psock).
      
      Create a kernel-private copy of data in these cases, same as tun/tap
      zerocopy transmission. Reuse that infrastructure: mark the skb as
      SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
      
      Unlike other zerocopy packets, do not set shinfo destructor_arg to
      struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
      when the original skb is released and a timestamp is recorded. Do not
      change this timestamp behavior. The ubuf_info->callback is not needed
      anyway, as no zerocopy notification is expected.
      
      Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
      resulting value is not a valid ubuf_info pointer, nor a valid
      tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
      
      The fix relies on features introduced in commit 52267790 ("sock:
      add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
      
      Tested with from `./in_netns.sh ./txring_overwrite` from
      http://github.com/wdebruij/kerneltools/tests
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Reported-by: NAnand H. Krishnan <anandhkrishnan@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cd8d46e
  14. 23 11月, 2018 1 次提交
    • T
      net/dim: Update DIM start sample after each DIM iteration · 0211dda6
      Tal Gilboa 提交于
      On every iteration of net_dim, the algorithm may choose to
      check for the system state by comparing current data sample
      with previous data sample. After each of these comparison,
      regardless of the action taken, the sample used as baseline
      is needed to be updated.
      
      This patch fixes a bug that causes DIM to take wrong decisions,
      due to never updating the baseline sample for comparison between
      iterations. This way, DIM always compares current sample with
      zeros.
      
      Although this is a functional fix, it also improves and stabilizes
      performance as the algorithm works properly now.
      
      Performance:
      Tested single UDP TX stream with pktgen:
      samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
      -m 24:8a:07:88:26:8b -f 3 -b 128
      
      ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
      Also, toggling between profiles is less frequent with the fix.
      
      Fixes: 8115b750 ("net/dim: use struct net_dim_sample as arg to net_dim")
      Signed-off-by: NTal Gilboa <talgi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0211dda6
  15. 22 11月, 2018 3 次提交
  16. 16 11月, 2018 1 次提交
  17. 15 11月, 2018 1 次提交
  18. 12 11月, 2018 1 次提交
  19. 10 11月, 2018 3 次提交
  20. 09 11月, 2018 1 次提交
  21. 08 11月, 2018 1 次提交