1. 22 12月, 2018 3 次提交
  2. 14 12月, 2018 5 次提交
  3. 05 12月, 2018 1 次提交
    • D
      acpi/nfit: Add support for Intel DSM 1.8 commands · b3ed2ce0
      Dave Jiang 提交于
      Add command definition for security commands defined in Intel DSM
      specification v1.8 [1]. This includes "get security state", "set
      passphrase", "unlock unit", "freeze lock", "secure erase", "overwrite",
      "overwrite query", "master passphrase enable/disable", and "master
      erase", . Since this adds several Intel definitions, move the relevant
      bits to their own header.
      
      These commands mutate physical data, but that manipulation is not cache
      coherent. The requirement to flush and invalidate caches makes these
      commands unsuitable to be called from userspace, so extra logic is added
      to detect and block these commands from being submitted via the ioctl
      command submission path.
      
      Lastly, the commands may contain sensitive key material that should not
      be dumped in a standard debug session. Update the nvdimm-command
      payload-dump facility to move security command payloads behind a
      default-off compile time switch.
      
      [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdfSigned-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3ed2ce0
  4. 01 12月, 2018 1 次提交
    • J
      psi: make disabling/enabling easier for vendor kernels · e0c27447
      Johannes Weiner 提交于
      Mel Gorman reports a hackbench regression with psi that would prohibit
      shipping the suse kernel with it default-enabled, but he'd still like
      users to be able to opt in at little to no cost to others.
      
      With the current combination of CONFIG_PSI and the psi_disabled bool set
      from the commandline, this is a challenge.  Do the following things to
      make it easier:
      
      1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
         to enable CONFIG_PSI in their kernel but leave the feature disabled
         unless a user requests it at boot-time.
      
         To avoid double negatives, rename psi_disabled= to psi=.
      
      2. Make psi_disabled a static branch to eliminate any branch costs
         when the feature is disabled.
      
      In terms of numbers before and after this patch, Mel says:
      
      : The following is a comparision using CONFIG_PSI=n as a baseline against
      : your patch and a vanilla kernel
      :
      :                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
      :                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
      : Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
      : Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
      : Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
      : Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
      : Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
      : Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
      : Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
      : Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
      : Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
      :
      : It's showing that the vanilla kernel takes a hit (as the bisection
      : indicated it would) and that disabling PSI by default is reasonably
      : close in terms of performance for this particular workload on this
      : particular machine so;
      
      Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: NMel Gorman <mgorman@techsingularity.net>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0c27447
  5. 30 11月, 2018 3 次提交
    • Z
      tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique · 0c7a52e4
      Zenghui Yu 提交于
      After enabling KVM event tracing, almost all of trace_kvm_exit()'s
      printk shows
      
      	"kvm_exit: IRQ: ..."
      
      even if the actual exception_type is NOT IRQ.  More specifically,
      trace_kvm_exit() is defined in virt/kvm/arm/trace.h by TRACE_EVENT.
      
      This slight problem may have existed after commit e6753f23
      ("tracepoint: Make rcuidle tracepoint callers use SRCU"). There are
      two variables in trace_kvm_exit() and __DO_TRACE() which have the
      same name, *idx*. Thus the actual value of *idx* will be overwritten
      when tracing. Fix it by adding a simple prefix.
      
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Wang Haibin <wanghaibin.wang@huawei.com>
      Cc: linux-trace-devel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: e6753f23 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0c7a52e4
    • K
      pstore/ram: Correctly calculate usable PRZ bytes · 89d328f6
      Kees Cook 提交于
      The actual number of bytes stored in a PRZ is smaller than the
      bytes requested by platform data, since there is a header on each
      PRZ. Additionally, if ECC is enabled, there are trailing bytes used
      as well. Normally this mismatch doesn't matter since PRZs are circular
      buffers and the leading "overflow" bytes are just thrown away. However, in
      the case of a compressed record, this rather badly corrupts the results.
      
      This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1".
      Any stored crashes would not be uncompressable (producing a pstorefs
      "dmesg-*.enc.z" file), and triggering errors at boot:
      
        [    2.790759] pstore: crypto_comp_decompress failed, ret = -22!
      
      Backporting this depends on commit 70ad35db ("pstore: Convert console
      write to use ->write_buf")
      Reported-by: NJoel Fernandes <joel@joelfernandes.org>
      Fixes: b0aad7a9 ("pstore: Add compression support to pstore")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      89d328f6
    • I
      Revert "xen/balloon: Mark unallocated host memory as UNUSABLE" · 12366410
      Igor Druzhinin 提交于
      This reverts commit b3cf8528.
      
      That commit unintentionally broke Xen balloon memory hotplug with
      "hotplug_unpopulated" set to 1. As long as "System RAM" resource
      got assigned under a new "Unusable memory" resource in IO/Mem tree
      any attempt to online this memory would fail due to general kernel
      restrictions on having "System RAM" resources as 1st level only.
      
      The original issue that commit has tried to workaround fa564ad9
      ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f,
      60-7f)") also got amended by the following 03a55173 ("x86/PCI: Move
      and shrink AMD 64-bit window to avoid conflict") which made the
      original fix to Xen ballooning unnecessary.
      Signed-off-by: NIgor Druzhinin <igor.druzhinin@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      12366410
  6. 28 11月, 2018 8 次提交
    • K
      fscache: Fix race in fscache_op_complete() due to split atomic_sub & read · 3f2b7b90
      kiran.modukuri 提交于
      The code in fscache_retrieval_complete is using atomic_sub followed by an
      atomic_read:
      
              atomic_sub(n_pages, &op->n_pages);
              if (atomic_read(&op->n_pages) <= 0)
                      fscache_op_complete(&op->op, true);
      
      This causes two threads doing a decrement of n_pages to race with each
      other seeing the op->refcount 0 at same time - and they end up calling
      fscache_op_complete() in both the threads leading to an assertion failure.
      
      Fix this by using atomic_sub_return_relaxed() instead of two calls.  Note
      that I'm using 'relaxed' rather than, say, 'release' as there aren't
      multiple variables that appear to need ordering across the release.
      
      The oops looks something like:
      
      FS-Cache: Assertion failed
      FS-Cache: 0 > 0 is false
      ...
      kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449!
      ...
      Workqueue: fscache_operation fscache_op_work_func [fscache]
      ...
      RIP: 0010:[<ffffffffc037eacd>] fscache_op_complete+0x10d/0x180 [fscache]
      ...
      Call Trace:
       [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles]
       [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache]
       [<ffffffff81096da0>] process_one_work+0x150/0x3f0
       [<ffffffff8109751a>] worker_thread+0x11a/0x470
       [<ffffffff81808e59>] ? __schedule+0x359/0x980
       [<ffffffff81097400>] ? rescuer_thread+0x310/0x310
       [<ffffffff8109cdd6>] kthread+0xd6/0xf0
       [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
       [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
       [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
      
      This seen this in 4.4.x kernels and the same bug affects fscache in latest
      upstreams kernels.
      
      Fixes: 1bb4b7f9 ("FS-Cache: The retrieval remaining-pages counter needs to be atomic_t")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3f2b7b90
    • T
      x86/speculation: Add prctl() control for indirect branch speculation · 9137bb27
      Thomas Gleixner 提交于
      Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
      PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
      indirect branch speculation via STIBP and IBPB.
      
      Invocations:
       Check indirect branch speculation status with
       - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
      
       Enable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
      
       Disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
      
       Force disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
      
      See Documentation/userspace-api/spec_ctrl.rst.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
      9137bb27
    • T
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · 46f7ecb1
      Thomas Gleixner 提交于
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de
      
      46f7ecb1
    • T
      x86/speculation: Rework SMT state change · a74cfffb
      Thomas Gleixner 提交于
      arch_smt_update() is only called when the sysfs SMT control knob is
      changed. This means that when SMT is enabled in the sysfs control knob the
      system is considered to have SMT active even if all siblings are offline.
      
      To allow finegrained control of the speculation mitigations, the actual SMT
      state is more interesting than the fact that siblings could be enabled.
      
      Rework the code, so arch_smt_update() is invoked from each individual CPU
      hotplug function, and simplify the update function while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
      
      a74cfffb
    • T
      sched/smt: Expose sched_smt_present static key · 321a874a
      Thomas Gleixner 提交于
      Make the scheduler's 'sched_smt_present' static key globaly available, so
      it can be used in the x86 speculation control code.
      
      Provide a query function and a stub for the CONFIG_SMP=n case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
      321a874a
    • P
      sched, trace: Fix prev_state output in sched_switch tracepoint · 3054426d
      Pavankumar Kondeti 提交于
      commit 3f5fe9fe ("sched/debug: Fix task state recording/printout")
      tried to fix the problem introduced by a previous commit efb40f58
      ("sched/tracing: Fix trace_sched_switch task-state printing"). However
      the prev_state output in sched_switch is still broken.
      
      task_state_index() uses fls() which considers the LSB as 1. Left
      shifting 1 by this value gives an incorrect mapping to the task state.
      Fix this by decrementing the value returned by __get_task_state()
      before shifting.
      
      Link: http://lkml.kernel.org/r/1540882473-1103-1-git-send-email-pkondeti@codeaurora.org
      
      Cc: stable@vger.kernel.org
      Fixes: 3f5fe9fe ("sched/debug: Fix task state recording/printout")
      Signed-off-by: NPavankumar Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      3054426d
    • S
      function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack · 39eb456d
      Steven Rostedt (VMware) 提交于
      Currently, the depth of the ret_stack is determined by curr_ret_stack index.
      The issue is that there's a race between setting of the curr_ret_stack and
      calling of the callback attached to the return of the function.
      
      Commit 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling
      trace return callback") moved the calling of the callback to after the
      setting of the curr_ret_stack, even stating that it was safe to do so, when
      in fact, it was the reason there was a barrier() there (yes, I should have
      commented that barrier()).
      
      Not only does the curr_ret_stack keep track of the current call graph depth,
      it also keeps the ret_stack content from being overwritten by new data.
      
      The function profiler, uses the "subtime" variable of ret_stack structure
      and by moving the curr_ret_stack, it allows for interrupts to use the same
      structure it was using, corrupting the data, and breaking the profiler.
      
      To fix this, there needs to be two variables to handle the call stack depth
      and the pointer to where the ret_stack is being used, as they need to change
      at two different locations.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      39eb456d
    • S
      function_graph: Make ftrace_push_return_trace() static · d125f3f8
      Steven Rostedt (VMware) 提交于
      As all architectures now call function_graph_enter() to do the entry work,
      no architecture should ever call ftrace_push_return_trace(). Make it static.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d125f3f8
  7. 27 11月, 2018 3 次提交
  8. 26 11月, 2018 2 次提交
    • B
      gpio: davinci: restore a way to manually specify the GPIO base · 786a9ab1
      Bartosz Golaszewski 提交于
      Commit 587f7a69 ("gpio: davinci: Use dev name for label and
      automatic base selection") broke the network support in legacy boot
      mode for da850-evm since we can no longer request the MDIO clock GPIO.
      
      Other boards may be broken too, which I haven't tested.
      
      The problem is in the fact that most board files still use the legacy
      GPIO API where lines are requested by numbers rather than descriptors.
      
      While this should be fixed eventually, in order to unbreak the board
      for now - provide a way to manually specify the GPIO base in platform
      data.
      
      Fixes: 587f7a69 ("gpio: davinci: Use dev name for label and automatic base selection")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      786a9ab1
    • F
      netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too · 89259088
      Florian Westphal 提交于
      syzbot was able to trigger the WARN in cttimeout_default_get() by
      passing UDPLITE as l4protocol.  Alias UDPLITE to UDP, both use
      same timeout values.
      
      Furthermore, also fetch GRE timeouts.  GRE is a bit more complicated,
      as it still can be a module and its netns_proto_gre struct layout isn't
      visible outside of the gre module. Can't move timeouts around, it
      appears conntrack sysctl unregister assumes net_generic() returns
      nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
      
      A followup nf-next patch could make gre tracker be built-in as well
      if needed, its not that large.
      
      Last, make the WARN() mention the missing protocol value in case
      anything else is missing.
      
      Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
      Fixes: 8866df92 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      89259088
  9. 24 11月, 2018 1 次提交
    • W
      packet: copy user buffers before orphan or clone · 5cd8d46e
      Willem de Bruijn 提交于
      tpacket_snd sends packets with user pages linked into skb frags. It
      notifies that pages can be reused when the skb is released by setting
      skb->destructor to tpacket_destruct_skb.
      
      This can cause data corruption if the skb is orphaned (e.g., on
      transmit through veth) or cloned (e.g., on mirror to another psock).
      
      Create a kernel-private copy of data in these cases, same as tun/tap
      zerocopy transmission. Reuse that infrastructure: mark the skb as
      SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
      
      Unlike other zerocopy packets, do not set shinfo destructor_arg to
      struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
      when the original skb is released and a timestamp is recorded. Do not
      change this timestamp behavior. The ubuf_info->callback is not needed
      anyway, as no zerocopy notification is expected.
      
      Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
      resulting value is not a valid ubuf_info pointer, nor a valid
      tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
      
      The fix relies on features introduced in commit 52267790 ("sock:
      add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
      
      Tested with from `./in_netns.sh ./txring_overwrite` from
      http://github.com/wdebruij/kerneltools/tests
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Reported-by: NAnand H. Krishnan <anandhkrishnan@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cd8d46e
  10. 23 11月, 2018 1 次提交
    • T
      net/dim: Update DIM start sample after each DIM iteration · 0211dda6
      Tal Gilboa 提交于
      On every iteration of net_dim, the algorithm may choose to
      check for the system state by comparing current data sample
      with previous data sample. After each of these comparison,
      regardless of the action taken, the sample used as baseline
      is needed to be updated.
      
      This patch fixes a bug that causes DIM to take wrong decisions,
      due to never updating the baseline sample for comparison between
      iterations. This way, DIM always compares current sample with
      zeros.
      
      Although this is a functional fix, it also improves and stabilizes
      performance as the algorithm works properly now.
      
      Performance:
      Tested single UDP TX stream with pktgen:
      samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
      -m 24:8a:07:88:26:8b -f 3 -b 128
      
      ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
      Also, toggling between profiles is less frequent with the fix.
      
      Fixes: 8115b750 ("net/dim: use struct net_dim_sample as arg to net_dim")
      Signed-off-by: NTal Gilboa <talgi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0211dda6
  11. 22 11月, 2018 4 次提交
    • B
      Revert "Input: Add the `REL_WHEEL_HI_RES` event code" · ffe0e7cf
      Benjamin Tissoires 提交于
      This reverts commit aaf9978c.
      
      Quoting Peter:
      
      There is a HID feature report called "Resolution Multiplier"
      Described in the "Enhanced Wheel Support in Windows" doc and
      the "USB HID Usage Tables" page 30.
      
      http://download.microsoft.com/download/b/d/1/bd1f7ef4-7d72-419e-bc5c-9f79ad7bb66e/wheel.docx
      https://www.usb.org/sites/default/files/documents/hut1_12v2.pdf
      
      This was new for Windows Vista, so we're only a decade behind here. I only
      accidentally found this a few days ago while debugging a stuck button on a
      Microsoft mouse.
      
      The docs above describe it like this: a wheel control by default sends
      value 1 per notch. If the resolution multiplier is active, the wheel is
      expected to send a value of $multiplier per notch (e.g. MS Sculpt mouse) or
      just send events more often, i.e. for less physical motion (e.g. MS Comfort
      mouse).
      
      For the latter, you need the right HW of course. The Sculpt mouse has
      tactile wheel clicks, so nothing really changes. The Comfort mouse has
      continuous motion with no tactile clicks. Similar to the free-wheeling
      Logitech mice but without any inertia.
      
      Note that the doc also says that Vista and onwards *always* enable this
      feature where available.
      
      An example HID definition looks like this:
      
             Usage Page Generic Desktop (0x01)
             Usage Resolution Multiplier (0x48)
             Logical Minimum 0
             Logical Maximum 1
             Physical Minimum 1
             Physical Maximum 16
             Report Size 2 # in bits
             Report Count 1
             Feature (Data, Var, Abs)
      
      So the actual bits have values 0 or 1 and that reflects real values 1 or 16.
      We've only seen single-bits so far, so there's low-res and hi-res, but
      nothing in between.
      
      The multiplier is available for HID usages "Wheel" and "AC Pan" (horiz wheel).
      Microsoft suggests that
      
      > Vendors should ship their devices with smooth scrolling disabled and allow
      > Windows to enable it. This ensures that the device works like a regular HID
      > device on legacy operating systems that do not support smooth scrolling.
      (see the wheel doc linked above)
      
      The mice that we tested so far do reset on unplug.
      
      Device Support looks to be all (?) Microsoft mice but nothing else
      
      Not supported:
      - Logitech G500s, G303
      - Roccat Kone XTD
      - all the cheap Lenovo, HP, Dell, Logitech USB mice that come with a
        workstation that I could find don't have it.
      - Etekcity something something
      - Razer Imperator
      
      Supported:
      - Microsoft Comfort Optical Mouse 3000 - yes, physical: 1:4
      - Microsoft Sculpt Ergonomic Mouse - yes, physical: 1:12
      - Microsoft Surface mouse - yes, physical: 1:4
      
      So again, I think this is really just available on Microsoft mice, but
      probably all decent MS mice released over the last decade.
      
      Looking at the hardware itself:
      
      - no noticeable notches in the weel
      - low-res: 18 events per 360deg rotation (click angle 20 deg)
      - high-res: 72 events per 360deg → matches multiplier of 4
      
      - I can feel the notches during wheel turns
      - low-res: 24 events per 360 deg rotation (click angle 15 deg)
        - horiz wheel is tilt-based, continuous output value 1
      - high-res: 24 events per 360deg with value 12 → matches multiplier of 12
        - horiz wheel output rate doubles/triples?, values is 3
      
      - It's a touch strip, not a wheel so no notches
      - high-res: events have value 4 instead of 1
        a bit strange given that it doesn't actually have notches.
      
      Ok, why is this an issue for the current API? First, because the logitech
      multiplier used in Harry's patches looks suspiciously like the Resolution
      Multiplier so I think we should assume it's the same thing. Nestor, can you
      shed some light on that?
      
      - `REL_WHEEL` is defined as the number of notches, emulated where needed.
      - `REL_WHEEL_HI_RES` is the movement of the user's finger in microns.
      - `WM_MOUSEWHEEL` (Windows) is is a multiple of 120, defined as "the threshold
        for action to be taken and one such action"
        https://docs.microsoft.com/en-us/windows/desktop/inputdev/wm-mousewheel
      
      If the multiplier is set to M, this means we need an accumulated value of M
      until we can claim there was a wheel click. So after enabling the multiplier
      and setting it to the maximum (like Windows):
      - M units are 15deg rotation → 1 unit is 2620/M micron (see below). This is
        the `REL_WHEEL_HI_RES` value.
        - wheel diameter 20mm: 15 deg rotation is 2.62mm, 2620 micron (pi * 20mm /
          (360deg/15deg))
      - For every M units accumulated, send one `REL_WHEEL` event
      
      The problem here is that we've now hardcoded 20mm/15 deg into the kernel and
      we have no way of getting the size of the wheel or the click angle into the
      kernel.
      
      In userspace we now have to undo the kernel's calculation. If our click angle
      is e.g. 20 degree we have to undo the (lossy) calculation from the kernel and
      calculate the correct angle instead. This also means the 15 is a hardcoded
      option forever and cannot be changed.
      
      In hid-logitech-hidpp.c, the microns per unit is hardcoded per device.
      Harry, did you measure those by hand? We'd need to update the kernel for
      every device and there are 10 years worth of devices from MS alone.
      
      The multiplier default is 8 which is in the right ballpark, so I'm pretty
      sure this is the same as the Resolution Multiplier, just in HID++ lingo. And
      given that the 120 magic factor is what Windows uses in the end, I can't
      imagine Logitech rolling their own thing here. Nestor?
      
      And we're already fairly inaccurate with the microns anyway. The MX Anywhere
      2S has a click angle of 20 degrees (18 stops) and a 17mm wheel, so a wheel
      notch is approximately 2.67mm, one event at multiplier 8 (1/8 of a notch)
      would be 334 micron. That's only 80% of the fallback value of 406 in the
      kernel. Multiplier 6 gives us 445micron (10% off). I'm assuming multiplier 7
      doesn't exist because it's not a factor of 120.
      
      Summary:
      
      Best option may be to simply do what Windows is doing, all the HW manufacturers
      have to use that approach after all. Switch `REL_WHEEL_HI_RES` to report in
      fractions of 120, with 120 being one notch and divide that by the multiplier
      for the actual events. So e.g. the Logitech multiplier 8 would send value 15
      for each event in hi-res mode. This can be converted in userspace to
      whatever userspace needs (combined with a hwdb there that tells you wheel
      size/click angle/...).
      
      Conflicts:
      	include/uapi/linux/input-event-codes.h -> I kept the new
               reserved event in the code, so I had to adapt the revert
               slightly
      Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Acked-by: NHarry Cutts <hcutts@chromium.org>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      ffe0e7cf
    • B
      Revert "HID: input: Create a utility class for counting scroll events" · f1539a0c
      Benjamin Tissoires 提交于
      This reverts commit 1ff2e1a4.
      
      It turns out the current API is not that compatible with
      some Microsoft mice, so better start again from scratch.
      Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Acked-by: NHarry Cutts <hcutts@chromium.org>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      f1539a0c
    • E
      tcp: defer SACK compression after DupThresh · 86de5921
      Eric Dumazet 提交于
      Jean-Louis reported a TCP regression and bisected to recent SACK
      compression.
      
      After a loss episode (receiver not able to keep up and dropping
      packets because its backlog is full), linux TCP stack is sending
      a single SACK (DUPACK).
      
      Sender waits a full RTO timer before recovering losses.
      
      While RFC 6675 says in section 5, "Algorithm Details",
      
         (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
             indicating at least three segments have arrived above the current
             cumulative acknowledgment point, which is taken to indicate loss
             -- go to step (4).
      ...
         (4) Invoke fast retransmit and enter loss recovery as follows:
      
      there are old TCP stacks not implementing this strategy, and
      still counting the dupacks before starting fast retransmit.
      
      While these stacks probably perform poorly when receivers implement
      LRO/GRO, we should be a little more gentle to them.
      
      This patch makes sure we do not enable SACK compression unless
      3 dupacks have been sent since last rcv_nxt update.
      
      Ideally we should even rearm the timer to send one or two
      more DUPACK if no more packets are coming, but that will
      be work aiming for linux-4.21.
      
      Many thanks to Jean-Louis for bisecting the issue, providing
      packet captures and testing this patch.
      
      Fixes: 5d9f4262 ("tcp: add SACK compression")
      Reported-by: NJean-Louis Dupond <jean-louis@dupond.be>
      Tested-by: NJean-Louis Dupond <jean-louis@dupond.be>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86de5921
    • R
      dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB · b3408715
      Robin Murphy 提交于
      With the overflow buffer removed, we no longer have a unique address
      which is guaranteed not to be a valid DMA target to use as an error
      token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent
      an unlikely DMA target, but unfortunately there are already SWIOTLB
      users with DMA-able memory at physical address 0 which now gets falsely
      treated as a mapping failure and leads to all manner of misbehaviour.
      
      The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the
      other commonly-used error value of all-bits-set, since the last single
      byte of memory is by far the least-likely-valid DMA target.
      
      Fixes: dff8d6c1 ("swiotlb: remove the overflow buffer")
      Reported-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b3408715
  12. 20 11月, 2018 1 次提交
  13. 16 11月, 2018 2 次提交
  14. 15 11月, 2018 1 次提交
  15. 12 11月, 2018 2 次提交
  16. 10 11月, 2018 2 次提交