1. 23 9月, 2019 1 次提交
  2. 18 9月, 2019 3 次提交
  3. 17 9月, 2019 6 次提交
  4. 16 9月, 2019 12 次提交
    • T
      tcp: Add snd_wnd to TCP_INFO · 8f7baad7
      Thomas Higdon 提交于
      Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
      performance problems --
      > (1) Usually when we're diagnosing TCP performance problems, we do so
      > from the sender, since the sender makes most of the
      > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
      > From the sender-side the thing that would be most useful is to see
      > tp->snd_wnd, the receive window that the receiver has advertised to
      > the sender.
      
      This serves the purpose of adding an additional __u32 to avoid the
      would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
      Signed-off-by: NThomas Higdon <tph@fb.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f7baad7
    • T
      tcp: Add TCP_INFO counter for packets received out-of-order · f9af2dbb
      Thomas Higdon 提交于
      For receive-heavy cases on the server-side, we want to track the
      connection quality for individual client IPs. This counter, similar to
      the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
      tracks out-of-order packet reception. By providing this counter in
      TCP_INFO, it will allow understanding to what degree receive-heavy
      sockets are experiencing out-of-order delivery and packet drops
      indicating congestion.
      
      Please note that this is similar to the counter in NetBSD TCP_INFO, and
      has the same name.
      
      Also note that we avoid increasing the size of the tcp_sock struct by
      taking advantage of a hole.
      Signed-off-by: NThomas Higdon <tph@fb.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9af2dbb
    • M
      dm: introduce DM_GET_TARGET_VERSION · afa179eb
      Mikulas Patocka 提交于
      This commit introduces a new ioctl DM_GET_TARGET_VERSION. It will load a
      target that is specified in the "name" entry in the parameter structure
      and return its version.
      
      This functionality is intended to be used by cryptsetup, so that it can
      query kernel capabilities before activating the device.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      afa179eb
    • I
      bpf: fix accessing bpf_sysctl.file_pos on s390 · d895a0f1
      Ilya Leoshkevich 提交于
      "ctx:file_pos sysctl:read write ok" fails on s390 with "Read value  !=
      nux". This is because verifier rewrites a complete 32-bit
      bpf_sysctl.file_pos update to a partial update of the first 32 bits of
      64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian
      systems.
      
      Fix by using an offset on big-endian systems.
      
      Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect
      a problem there, since it expects to see 0, which it gets with high
      probability in error cases, so change it to seek to offset 3 and expect
      3 in bpf_sysctl.file_pos.
      
      Fixes: e1550bfe ("bpf: Add file_pos field to bpf_sysctl ctx")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
      d895a0f1
    • V
      net: sched: use get_dev() action API in flow_action infra · 470d5060
      Vlad Buslov 提交于
      When filling in hardware intermediate representation tc_setup_flow_action()
      directly obtains, checks and takes reference to dev used by mirred action,
      instead of using act->ops->get_dev() API created specifically for this
      purpose. In order to remove code duplication, refactor flow_action infra to
      use action API when obtaining mirred action target dev. Extend get_dev()
      with additional argument that is used to provide dev destructor to the
      user.
      
      Fixes: 5a6ff4b1 ("net: sched: take reference to action dev before calling offloads")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      470d5060
    • V
      net: sched: take reference to psample group in flow_action infra · 4a5da47d
      Vlad Buslov 提交于
      With recent patch set that removed rtnl lock dependency from cls hardware
      offload API rtnl lock is only taken when reading action data and can be
      released after action-specific data is parsed into intermediate
      representation. However, sample action psample group is passed by pointer
      without obtaining reference to it first, which makes it possible to
      concurrently overwrite the action and deallocate object pointed by
      psample_group pointer after rtnl lock is released but before driver
      finished using the pointer.
      
      To prevent such race condition, obtain reference to psample group while it
      is used by flow_action infra. Extend psample API with function
      psample_group_take() that increments psample group reference counter.
      Extend struct tc_action_ops with new get_psample_group() API. Implement the
      API for action sample using psample_group_take() and already existing
      psample_group_put() as a destructor. Use it in tc_setup_flow_action() to
      take reference to psample group pointed to by entry->sample.psample_group
      and release it in tc_cleanup_flow_action().
      
      Disable bh when taking psample_groups_lock. The lock is now taken while
      holding action tcf_lock that is used by data path and requires bh to be
      disabled, so doing the same for psample_groups_lock is necessary to
      preserve SOFTIRQ-irq-safety.
      
      Fixes: 918190f5 ("net: sched: flower: don't take rtnl lock for cls hw offloads API")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a5da47d
    • V
      net: sched: extend flow_action_entry with destructor · 1158958a
      Vlad Buslov 提交于
      Generalize flow_action_entry cleanup by extending the structure with
      pointer to destructor function. Set the destructor in
      tc_setup_flow_action(). Refactor tc_cleanup_flow_action() to call
      entry->destructor() instead of using switch that dispatches by entry->id
      and manually executes cleanup.
      
      This refactoring is necessary for following patches in this series that
      require destructor to use tc_action->ops callbacks that can't be easily
      obtained in tc_cleanup_flow_action().
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1158958a
    • W
      udp: correct reuseport selection with connected sockets · acdcecc6
      Willem de Bruijn 提交于
      UDP reuseport groups can hold a mix unconnected and connected sockets.
      Ensure that connections only receive all traffic to their 4-tuple.
      
      Fast reuseport returns on the first reuseport match on the assumption
      that all matches are equal. Only if connections are present, return to
      the previous behavior of scoring all sockets.
      
      Record if connections are present and if so (1) treat such connected
      sockets as an independent match from the group, (2) only return
      2-tuple matches from reuseport and (3) do not return on the first
      2-tuple reuseport match to allow for a higher scoring match later.
      
      New field has_conns is set without locks. No other fields in the
      bitmap are modified at runtime and the field is only ever set
      unconditionally, so an RMW cannot miss a change.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.comSigned-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acdcecc6
    • H
      block: make rq sector size accessible for block stats · 3d244306
      Hou Tao 提交于
      Currently rq->data_len will be decreased by partial completion or
      zeroed by completion, so when blk_stat_add() is invoked, data_len
      will be zero and there will never be samples in poll_cb because
      blk_mq_poll_stats_bkt() will return -1 if data_len is zero.
      
      We could move blk_stat_add() back to __blk_mq_complete_request(),
      but that would make the effort of trying to call ktime_get_ns()
      once in vain. Instead we can reuse throtl_size field, and use
      it for both block stats and block throttle, and adjust the
      logic in blk_mq_poll_stats_bkt() accordingly.
      
      Fixes: 4bc6339a ("block: move blk_stat_add() to __blk_mq_end_request()")
      Tested-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d244306
    • P
      net/sched: fix race between deactivation and dequeue for NOLOCK qdisc · d518d2ed
      Paolo Abeni 提交于
      The test implemented by some_qdisc_is_busy() is somewhat loosy for
      NOLOCK qdisc, as we may hit the following scenario:
      
      CPU1						CPU2
      // in net_tx_action()
      clear_bit(__QDISC_STATE_SCHED...);
      						// in some_qdisc_is_busy()
      						val = (qdisc_is_running(q) ||
      						       test_bit(__QDISC_STATE_SCHED,
      								&q->state));
      						// here val is 0 but...
      qdisc_run(q)
      // ... CPU1 is going to run the qdisc next
      
      As a conseguence qdisc_run() in net_tx_action() can race with qdisc_reset()
      in dev_qdisc_reset(). Such race is not possible for !NOLOCK qdisc as
      both the above bit operations are under the root qdisc lock().
      
      After commit 021a17ed ("pfifo_fast: drop unneeded additional lock on dequeue")
      the race can cause use after free and/or null ptr dereference, but the root
      cause is likely older.
      
      This patch addresses the issue explicitly checking for deactivation under
      the seqlock for NOLOCK qdisc, so that the qdisc_run() in the critical
      scenario becomes a no-op.
      
      Note that the enqueue() op can still execute concurrently with dev_qdisc_reset(),
      but that is safe due to the skb_array() locking, and we can't avoid that
      for NOLOCK qdiscs.
      
      Fixes: 021a17ed ("pfifo_fast: drop unneeded additional lock on dequeue")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Reported-and-tested-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d518d2ed
    • R
      compiler-types.h: add asm_inline definition · eb111869
      Rasmus Villemoes 提交于
      This adds an asm_inline macro which expands to "asm inline" [1] when
      the compiler supports it. This is currently gcc 9.1+, gcc 8.3
      and (once released) gcc 7.5 [2]. It expands to just "asm" for other
      compilers.
      
      Using asm inline("foo") instead of asm("foo") overrules gcc's
      heuristic estimate of the size of the code represented by the asm()
      statement, and makes gcc use the minimum possible size instead. That
      can in turn affect gcc's inlining decisions.
      
      I wasn't sure whether to make this a function-like macro or not - this
      way, it can be combined with volatile as
      
        asm_inline volatile()
      
      but perhaps we'd prefer to spell that
      
        asm_inline_volatile()
      
      anyway.
      
      The Kconfig logic is taken from an RFC patch by Masahiro Yamada [3].
      
      [1] Technically, asm __inline, since both inline and __inline__
      are macros that attach various attributes, making gcc barf if one
      literally does "asm inline()". However, the third spelling __inline is
      available for referring to the bare keyword.
      
      [2] https://lore.kernel.org/lkml/20190907001411.GG9749@gate.crashing.org/
      
      [3] https://lore.kernel.org/lkml/1544695154-15250-1-git-send-email-yamada.masahiro@socionext.com/Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      eb111869
    • R
      compiler_types.h: don't #define __inline · c30724e9
      Rasmus Villemoes 提交于
      The spellings __inline and __inline__ should be reserved for uses
      where one really wants to refer to the inline keyword, regardless of
      whether or not the spelling "inline" has been #defined to something
      else. Due to use of __inline__ in uapi headers, we can't easily get
      rid of the definition of __inline__. However, almost all users of
      __inline have been converted to inline, so we can get rid of that
      #define.
      
      The exception is include/acpi/platform/acintel.h. However, that header
      is only included when using the intel compiler (does anybody actually
      build the kernel with that?), and the ACPI_INLINE macro is only used
      in the definition of utterly trivial stub functions, where I doubt a
      small change of semantics (lack of __gnu_inline) changes anything.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      [Fix trivial typo in message]
      Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      c30724e9
  5. 14 9月, 2019 6 次提交
  6. 13 9月, 2019 12 次提交