1. 09 10月, 2019 1 次提交
  2. 08 10月, 2019 7 次提交
  3. 07 10月, 2019 4 次提交
  4. 06 10月, 2019 7 次提交
    • T
      libbpf: Add cscope and tags targets to Makefile · a9eb048d
      Toke Høiland-Jørgensen 提交于
      Using cscope and/or TAGS files for navigating the source code is useful.
      Add simple targets to the Makefile to generate the index files for both
      tools.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20191004153444.1711278-1-toke@redhat.com
      a9eb048d
    • A
      Merge branch 'libbpf-api' · b84fbfe2
      Alexei Starovoitov 提交于
      Andrii Nakryiko says:
      
      ====================
      Add bpf_object__open_file() and bpf_object__open_mem() APIs that use a new
      approach to providing future-proof non-ABI-breaking API changes. It relies on
      APIs accepting optional self-describing "opts" struct, containing its own
      size, filled out and provided by potentially outdated (as well as
      newer-than-libbpf) user application. A set of internal helper macros
      (OPTS_VALID, OPTS_HAS, and OPTS_GET) streamline and simplify a graceful
      handling forward and backward compatibility for user applications dynamically
      linked against different versions of libbpf shared library.
      
      Users of libbpf are provided with convenience macro LIBBPF_OPTS that takes
      care of populating correct structure size and zero-initializes options struct,
      which helps avoid obscure issues of unitialized padding. Uninitialized padding
      in a struct might turn into garbage-populated new fields understood by future
      versions of libbpf.
      
      Patch #1 removes enforcement of kern_version in libbpf and always populates
      correct one on behalf of users.
      Patch #2 defines necessary infrastructure for options and two new open APIs
      relying on it.
      Patch #3 fixes bug in bpf_object__name().
      Patch #4 switches two of test_progs' tests to use new APIs as a validation
      that they work as expected.
      
      v2->v3:
      - fix LIBBPF_OPTS() to ensure zero-initialization of padded bytes;
      - pass through name override and relaxed maps flag for open_file() (Toke);
      - fix bpf_object__name() to actually return object name;
      - don't bother parsing and verifying version section (John);
      
      v1->v2:
      - use better approach for tracking last field in opts struct;
      - convert few tests to new APIs for validation;
      - fix bug with using offsetof(last_field) instead of offsetofend(last_field).
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b84fbfe2
    • A
      selftests/bpf: switch tests to new bpf_object__open_{file, mem}() APIs · 928ca75e
      Andrii Nakryiko 提交于
      Verify new bpf_object__open_mem() and bpf_object__open_file() APIs work
      as expected by switching test_attach_probe test to use embedded BPF
      object and bpf_object__open_mem() and test_reference_tracking to
      bpf_object__open_file().
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      928ca75e
    • A
      libbpf: fix bpf_object__name() to actually return object name · c9e4c301
      Andrii Nakryiko 提交于
      bpf_object__name() was returning file path, not name. Fix this.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c9e4c301
    • A
      libbpf: add bpf_object__open_{file, mem} w/ extensible opts · 2ce8450e
      Andrii Nakryiko 提交于
      Add new set of bpf_object__open APIs using new approach to optional
      parameters extensibility allowing simpler ABI compatibility approach.
      
      This patch demonstrates an approach to implementing libbpf APIs that
      makes it easy to extend existing APIs with extra optional parameters in
      such a way, that ABI compatibility is preserved without having to do
      symbol versioning and generating lots of boilerplate code to handle it.
      To facilitate succinct code for working with options, add OPTS_VALID,
      OPTS_HAS, and OPTS_GET macros that hide all the NULL, size, and zero
      checks.
      
      Additionally, newly added libbpf APIs are encouraged to follow similar
      pattern of having all mandatory parameters as formal function parameters
      and always have optional (NULL-able) xxx_opts struct, which should
      always have real struct size as a first field and the rest would be
      optional parameters added over time, which tune the behavior of existing
      API, if specified by user.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      2ce8450e
    • A
      libbpf: stop enforcing kern_version, populate it for users · 5e61f270
      Andrii Nakryiko 提交于
      Kernel version enforcement for kprobes/kretprobes was removed from
      5.0 kernel in 6c4fc209 ("bpf: remove useless version check for prog load").
      Since then, BPF programs were specifying SEC("version") just to please
      libbpf. We should stop enforcing this in libbpf, if even kernel doesn't
      care. Furthermore, libbpf now will pre-populate current kernel version
      of the host system, in case we are still running on old kernel.
      
      This patch also removes __bpf_object__open_xattr from libbpf.h, as
      nothing in libbpf is relying on having it in that header. That function
      was never exported as LIBBPF_API and even name suggests its internal
      version. So this should be safe to remove, as it doesn't break ABI.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5e61f270
    • A
      libbpf: Fix BTF-defined map's __type macro handling of arrays · a53ba15d
      Andrii Nakryiko 提交于
      Due to a quirky C syntax of declaring pointers to array or function
      prototype, existing __type() macro doesn't work with map key/value types
      that are array or function prototype. One has to create a typedef first
      and use it to specify key/value type for a BPF map.  By using typeof(),
      pointer to type is now handled uniformly for all kinds of types. Convert
      one of self-tests as a demonstration.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191004040211.2434033-1-andriin@fb.com
      a53ba15d
  5. 05 10月, 2019 2 次提交
  6. 03 10月, 2019 2 次提交
    • I
      selftests/bpf: Correct path to include msg + path · c5881463
      Ivan Khoronzhuk 提交于
      The "path" buf is supposed to contain path + printf msg up to 24 bytes.
      It will be cut anyway, but compiler generates truncation warns like:
      
      "
      samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c: In
      function ‘setup_cgroup_environment’:
      samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:34:
      warning: ‘/cgroup.controllers’ directive output may be truncated
      writing 19 bytes into a region of size between 1 and 4097
      [-Wformat-truncation=]
      snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
      				  ^~~~~~~~~~~~~~~~~~~
      samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:2:
      note: ‘snprintf’ output between 20 and 4116 bytes into a destination
      of size 4097
      snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:34:
      warning: ‘/cgroup.subtree_control’ directive output may be truncated
      writing 23 bytes into a region of size between 1 and 4097
      [-Wformat-truncation=]
      snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
      				  ^~~~~~~~~~~~~~~~~~~~~~~
      cgroup_path);
      samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:2:
      note: ‘snprintf’ output between 24 and 4120 bytes into a destination
      of size 4097
      snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
      cgroup_path);
      "
      
      In order to avoid warns, lets decrease buf size for cgroup workdir on
      24 bytes with assumption to include also "/cgroup.subtree_control" to
      the address. The cut will never happen anyway.
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20191002120404.26962-3-ivan.khoronzhuk@linaro.org
      c5881463
    • I
      selftests/bpf: Add static to enable_all_controllers() · fb27dcd2
      Ivan Khoronzhuk 提交于
      Add static to enable_all_controllers() to get rid from annoying warning
      during samples/bpf build:
      
      samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:44:5:
      warning: no previous prototype for ‘enable_all_controllers’
      [-Wmissing-prototypes]
       int enable_all_controllers(char *cgroup_path)
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20191002120404.26962-2-ivan.khoronzhuk@linaro.org
      fb27dcd2
  7. 02 10月, 2019 15 次提交
    • A
      libbpf: Bump current version to v0.0.6 · 03bd4773
      Andrii Nakryiko 提交于
      New release cycle started, let's bump to v0.0.6 proactively.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20190930222503.519782-1-andriin@fb.com
      03bd4773
    • S
      dt-bindings: sh_eth convert bindings to json-schema · 37a2fce0
      Simon Horman 提交于
      Convert Renesas Electronics SH EtherMAC bindings documentation to
      json-schema.  Also name bindings documentation file according to the compat
      string being documented.
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37a2fce0
    • P
      net: usb: ax88179_178a: allow optionally getting mac address from device tree · 9fb137ae
      Peter Fink 提交于
      Adopt and integrate the feature to pass the MAC address via device tree
      from asix_device.c (03fc5d4f) also to other ax88179 based asix chips.
      E.g. the bootloader fills in local-mac-address and the driver will then
      pick up and use this MAC address.
      Signed-off-by: NPeter Fink <pfink@christ-es.de>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fb137ae
    • N
      ipv6: minor code reorg in inet6_fill_ifla6_attrs() · 0d7982ce
      Nicolas Dichtel 提交于
      Just put related code together to ease code reading: the memcpy() is
      related to the nla_reserve().
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d7982ce
    • D
      Merge branch 'netdev-altnames' · 7a56493f
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      net: introduce alternative names for network interfaces
      
      In the past, there was repeatedly discussed the IFNAMSIZ (16) limit for
      netdevice name length. Now when we have PF and VF representors
      with port names like "pfXvfY", it became quite common to hit this limit:
      0123456789012345
      enp131s0f1npf0vf6
      enp131s0f1npf0vf22
      
      Udev cannot rename these interfaces out-of-the-box and user needs to
      create custom rules to handle them.
      
      Also, udev has multiple schemes of netdev names. From udev code:
       * Type of names:
       *   b<number>                             - BCMA bus core number
       *   c<bus_id>                             - bus id of a grouped CCW or CCW device,
       *                                           with all leading zeros stripped [s390]
       *   o<index>[n<phys_port_name>|d<dev_port>]
       *                                         - on-board device index number
       *   s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
       *                                         - hotplug slot index number
       *   x<MAC>                                - MAC address
       *   [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
       *                                         - PCI geographical location
       *   [P<domain>]p<bus>s<slot>[f<function>][u<port>][..][c<config>][i<interface>]
       *                                         - USB port number chain
       *   v<slot>                               - VIO slot number (IBM PowerVM)
       *   a<vendor><model>i<instance>           - Platform bus ACPI instance id
       *   i<addr>n<phys_port_name>              - Netdevsim bus address and port name
      
      One device can be often renamed by multiple patterns at the
      same time (e.g. pci address/mac).
      
      This patchset introduces alternative names for network interfaces.
      Main goal is to:
      1) Overcome the IFNAMSIZ limitation (altname limitation is 128 bytes)
      2) Allow to have multiple names at the same time (multiple udev patterns)
      3) Allow to use alternative names as handle for commands
      
      The patchset introduces two new commands to add/delete list of properties.
      Currently only alternative names are implemented but the ifrastructure
      could be easily extended later on. This is very similar to the list of vlan
      and tunnels being added/deleted to/from bridge ports.
      
      See following examples.
      
      $ ip link
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
      
      -> Add alternative names for dummy0:
      
      $ ip link prop add dummy0 altname someothername
      $ ip link prop add dummy0 altname someotherveryveryveryverylongname
      $ ip link
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname someothername
          altname someotherveryveryveryverylongname
      $ ip link show someotherveryveryveryverylongname
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname someothername
          altname someotherveryveryveryverylongname
      
      -> Add bridge brx, add it's alternative name and use alternative names to
         do enslavement.
      
      $ ip link add name brx type bridge
      $ ip link prop add brx altname mypersonalsuperspecialbridge
      $ ip link set someotherveryveryveryverylongname master mypersonalsuperspecialbridge
      $ ip link
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname someothername
          altname someotherveryveryveryverylongname
      3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname mypersonalsuperspecialbridge
      
      -> Add ipv4 address to the bridge using alternative name:
      
      $ ip addr add 192.168.0.1/24 dev mypersonalsuperspecialbridge
      $ ip addr show mypersonalsuperspecialbridge
      3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname mypersonalsuperspecialbridge
          inet 192.168.0.1/24 scope global brx
             valid_lft forever preferred_lft forever
      
      -> Delete one of dummy0 alternative names:
      
      $ ip link prop del dummy0 altname someotherveryveryveryverylongname
      $ ip link
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname someothername
      3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname mypersonalsuperspecialbridge
      
      -> Add multiple alternative names at once
      
      $ ip link prop add dummy0 altname a altname b altname c altname d
      $ ip link
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname someothername
          altname a
          altname b
          altname c
          altname d
      3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
          altname mypersonalsuperspecialbridge
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a56493f
    • J
      net: rtnetlink: add possibility to use alternative names as message handle · 76c9ac0e
      Jiri Pirko 提交于
      Extend the basic rtnetlink commands to use alternative interface names
      as a handle instead of ifindex and ifname.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76c9ac0e
    • J
      net: rtnetlink: introduce helper to get net_device instance by ifname · cc6090e9
      Jiri Pirko 提交于
      Introduce helper function rtnl_get_dev() that gets net_device structure
      instance pointer according to passed ifname or ifname attribute.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc6090e9
    • J
      net: rtnetlink: unify the code in __rtnl_newlink get dev with the rest · 7af12cba
      Jiri Pirko 提交于
      __rtnl_newlink() code flow is a bit different around tb[IFLA_IFNAME]
      processing comparing to the other places. Change that to be unified with
      the rest.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7af12cba
    • J
      net: rtnetlink: put alternative names to getlink message · 88f4fb0c
      Jiri Pirko 提交于
      Extend exiting getlink info message with list of properties. Now the
      only ones are alternative names.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88f4fb0c
    • J
      net: rtnetlink: add linkprop commands to add and delete alternative ifnames · 36fbf1e5
      Jiri Pirko 提交于
      Add two commands to add and delete list of link properties. Implement
      the first property type along - alternative ifnames.
      Each net device can have multiple alternative names.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36fbf1e5
    • J
      net: introduce name_node struct to be used in hashlist · ff927412
      Jiri Pirko 提交于
      Introduce name_node structure to hold name of device and put it into
      hashlist instead of putting there struct net_device directly. Add a
      necessary infrastructure to manipulate the hashlist. This prepares
      the code to use the same hashlist for alternative names introduced
      later in this set.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff927412
    • J
      net: procfs: use index hashlist instead of name hashlist · 6958c97a
      Jiri Pirko 提交于
      Name hashlist is going to be used for more than just dev->name, so use
      rather index hashlist for iteration over net_device instances.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6958c97a
    • E
      tcp: add ipv6_addr_v4mapped_loopback() helper · be2644aa
      Eric Dumazet 提交于
      tcp_twsk_unique() has a hard coded assumption about ipv4 loopback
      being 127/8
      
      Lets instead use the standard ipv4_is_loopback() method,
      in a new ipv6_addr_v4mapped_loopback() helper.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be2644aa
    • J
      net: core: dev: replace state xoff flag comparison by netif_xmit_stopped method · 5be5515a
      Julio Faracco 提交于
      Function netif_schedule_queue() has a hardcoded comparison between queue
      state and any xoff flag. This comparison does the same thing as method
      netif_xmit_stopped(). In terms of code clarity, it is better. See other
      methods like: generic_xdp_tx() and dev_direct_xmit().
      Signed-off-by: NJulio Faracco <jcfaracco@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be5515a
    • P
      r8152: Factor out OOB link list waits · 5f71c840
      Prashant Malani 提交于
      The same for-loop check for the LINK_LIST_READY bit of an OOB_CTRL
      register is used in several places. Factor these out into a single
      function to reduce the lines of code.
      
      Change-Id: I20e8f327045a72acc0a83e2d145ae2993ab62915
      Signed-off-by: NPrashant Malani <pmalani@chromium.org>
      Reviewed-by: NGrant Grundler <grundler@chromium.org>
      Acked-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f71c840
  8. 29 9月, 2019 2 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 02dc96ef
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Sanity check URB networking device parameters to avoid divide by
          zero, from Oliver Neukum.
      
       2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
          don't work properly. Longer term this needs a better fix tho. From
          Vijay Khemka.
      
       3) Small fixes to selftests (use ping when ping6 is not present, etc.)
          from David Ahern.
      
       4) Bring back rt_uses_gateway member of struct rtable, it's semantics
          were not well understood and trying to remove it broke things. From
          David Ahern.
      
       5) Move usbnet snaity checking, ignore endpoints with invalid
          wMaxPacketSize. From Bjørn Mork.
      
       6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
      
       7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
          Alex Vesker, and Yevgeny Kliteynik
      
       8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
      
       9) Fix crash when removing sch_cbs entry while offloading is enabled,
          from Vinicius Costa Gomes.
      
      10) Signedness bug fixes, generally in looking at the result given by
          of_get_phy_mode() and friends. From Dan Crapenter.
      
      11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
      
      12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
      
      13) Fix quantization code in tcp_bbr, from Kevin Yang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
        net: tap: clean up an indentation issue
        nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
        tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
        sk_buff: drop all skb extensions on free and skb scrubbing
        tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
        mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
        Documentation: Clarify trap's description
        mlxsw: spectrum: Clear VLAN filters during port initialization
        net: ena: clean up indentation issue
        NFC: st95hf: clean up indentation issue
        net: phy: micrel: add Asym Pause workaround for KSZ9021
        net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
        ptp: correctly disable flags on old ioctls
        lib: dimlib: fix help text typos
        net: dsa: microchip: Always set regmap stride to 1
        nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
        nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
        net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
        vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
        net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
        ...
      02dc96ef
    • L
      Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) · edf445ad
      Linus Torvalds 提交于
      Merge hugepage allocation updates from David Rientjes:
       "We (mostly Linus, Andrea, and myself) have been discussing offlist how
        to implement a sane default allocation strategy for hugepages on NUMA
        platforms.
      
        With these reverts in place, the page allocator will happily allocate
        a remote hugepage immediately rather than try to make a local hugepage
        available. This incurs a substantial performance degradation when
        memory compaction would have otherwise made a local hugepage
        available.
      
        This series reverts those reverts and attempts to propose a more sane
        default allocation strategy specifically for hugepages. Andrea
        acknowledges this is likely to fix the swap storms that he originally
        reported that resulted in the patches that removed __GFP_THISNODE from
        hugepage allocations.
      
        The immediate goal is to return 5.3 to the behavior the kernel has
        implemented over the past several years so that remote hugepages are
        not immediately allocated when local hugepages could have been made
        available because the increased access latency is untenable.
      
        The next goal is to introduce a sane default allocation strategy for
        hugepages allocations in general regardless of the configuration of
        the system so that we prevent thrashing of local memory when
        compaction is unlikely to succeed and can prefer remote hugepages over
        remote native pages when the local node is low on memory."
      
      Note on timing: this reverts the hugepage VM behavior changes that got
      introduced fairly late in the 5.3 cycle, and that fixed a huge
      performance regression for certain loads that had been around since
      4.18.
      
      Andrea had this note:
      
       "The regression of 4.18 was that it was taking hours to start a VM
        where 3.10 was only taking a few seconds, I reported all the details
        on lkml when it was finally tracked down in August 2018.
      
           https://lore.kernel.org/linux-mm/20180820032640.9896-2-aarcange@redhat.com/
      
        __GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio
        workload degrade like in the "current upstream" above. And it still
        would have been that bad as above until 5.3-rc5"
      
      where the bad behavior ends up happening as you fill up a local node,
      and without that change, you'd get into the nasty swap storm behavior
      due to compaction working overtime to make room for more memory on the
      nodes.
      
      As a result 5.3 got the two performance fix reverts in rc5.
      
      However, David Rientjes then noted that those performance fixes in turn
      regressed performance for other loads - although not quite to the same
      degree.  He suggested reverting the reverts and instead replacing them
      with two small changes to how hugepage allocations are done (patch
      descriptions rephrased by me):
      
       - "avoid expensive reclaim when compaction may not succeed": just admit
         that the allocation failed when you're trying to allocate a huge-page
         and compaction wasn't successful.
      
       - "allow hugepage fallback to remote nodes when madvised": when that
         node-local huge-page allocation failed, retry without forcing the
         local node.
      
      but by then I judged it too late to replace the fixes for a 5.3 release.
      So 5.3 was released with behavior that harked back to the pre-4.18 logic.
      
      But now we're in the merge window for 5.4, and we can see if this
      alternate model fixes not just the horrendous swap storm behavior, but
      also restores the performance regression that the late reverts caused.
      
      Fingers crossed.
      
      * emailed patches from David Rientjes <rientjes@google.com>:
        mm, page_alloc: allow hugepage fallback to remote nodes when madvised
        mm, page_alloc: avoid expensive reclaim when compaction may not succeed
        Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
        Revert "Revert "mm, thp: restore node-local hugepage allocations""
      edf445ad