1. 07 8月, 2020 1 次提交
    • Y
      bpf: Change uapi for bpf iterator map elements · 5e7b3020
      Yonghong Song 提交于
      Commit a5cbe05a ("bpf: Implement bpf iterator for
      map elements") added bpf iterator support for
      map elements. The map element bpf iterator requires
      info to identify a particular map. In the above
      commit, the attr->link_create.target_fd is used
      to carry map_fd and an enum bpf_iter_link_info
      is added to uapi to specify the target_fd actually
      representing a map_fd:
          enum bpf_iter_link_info {
      	BPF_ITER_LINK_UNSPEC = 0,
      	BPF_ITER_LINK_MAP_FD = 1,
      
      	MAX_BPF_ITER_LINK_INFO,
          };
      
      This is an extensible approach as we can grow
      enumerator for pid, cgroup_id, etc. and we can
      unionize target_fd for pid, cgroup_id, etc.
      But in the future, there are chances that
      more complex customization may happen, e.g.,
      for tasks, it could be filtered based on
      both cgroup_id and user_id.
      
      This patch changed the uapi to have fields
      	__aligned_u64	iter_info;
      	__u32		iter_info_len;
      for additional iter_info for link_create.
      The iter_info is defined as
      	union bpf_iter_link_info {
      		struct {
      			__u32   map_fd;
      		} map;
      	};
      
      So future extension for additional customization
      will be easier. The bpf_iter_link_info will be
      passed to target callback to validate and generic
      bpf_iter framework does not need to deal it any
      more.
      
      Note that map_fd = 0 will be considered invalid
      and -EBADF will be returned to user space.
      
      Fixes: a5cbe05a ("bpf: Implement bpf iterator for map elements")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
      5e7b3020
  2. 05 8月, 2020 1 次提交
    • S
      ipv4: route: Ignore output interface in FIB lookup for PMTU route · df23bb18
      Stefano Brivio 提交于
      Currently, processes sending traffic to a local bridge with an
      encapsulation device as a port don't get ICMP errors if they exceed
      the PMTU of the encapsulated link.
      
      David Ahern suggested this as a hack, but it actually looks like
      the correct solution: when we update the PMTU for a given destination
      by means of updating or creating a route exception, the encapsulation
      might trigger this because of PMTU discovery happening either on the
      encapsulation device itself, or its lower layer. This happens on
      bridged encapsulations only.
      
      The output interface shouldn't matter, because we already have a
      valid destination. Drop the output interface restriction from the
      associated route lookup.
      
      For UDP tunnels, we will now have a route exception created for the
      encapsulation itself, with a MTU value reflecting its headroom, which
      allows a bridge forwarding IP packets originated locally to deliver
      errors back to the sending socket.
      
      The behaviour is now consistent with IPv6 and verified with selftests
      pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in
      this series.
      
      v2:
      - reset output interface only for bridge ports (David Ahern)
      - add and use netif_is_any_bridge_port() helper (David Ahern)
      Suggested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df23bb18
  3. 04 8月, 2020 6 次提交
  4. 03 8月, 2020 2 次提交
  5. 02 8月, 2020 3 次提交
  6. 01 8月, 2020 3 次提交
    • V
      sched: Document arch_scale_*_capacity() · f4470cdf
      Valentin Schneider 提交于
      Rather that hide their purpose in some dark, damp corner of Documentation/,
      add some documentation to the default implementations.
      Signed-off-by: NValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200731192016.7484-2-valentin.schneider@arm.com
      f4470cdf
    • R
      rtnetlink: add support for protodown reason · 829eb208
      Roopa Prabhu 提交于
      netdev protodown is a mechanism that allows protocols to
      hold an interface down. It was initially introduced in
      the kernel to hold links down by a multihoming protocol.
      There was also an attempt to introduce protodown
      reason at the time but was rejected. protodown and protodown reason
      is supported by almost every switching and routing platform.
      It was ok for a while to live without a protodown reason.
      But, its become more critical now given more than
      one protocol may need to keep a link down on a system
      at the same time. eg: vrrp peer node, port security,
      multihoming protocol. Its common for Network operators and
      protocol developers to look for such a reason on a networking
      box (Its also known as errDisable by most networking operators)
      
      This patch adds support for link protodown reason
      attribute. There are two ways to maintain protodown
      reasons.
      (a) enumerate every possible reason code in kernel
          - A protocol developer has to make a request and
            have that appear in a certain kernel version
      (b) provide the bits in the kernel, and allow user-space
      (sysadmin or NOS distributions) to manage the bit-to-reasonname
      map.
      	- This makes extending reason codes easier (kind of like
            the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)
      
      This patch takes approach (b).
      
      a few things about the patch:
      - It treats the protodown reason bits as counter to indicate
      active protodown users
      - Since protodown attribute is already an exposed UAPI,
      the reason is not enforced on a protodown set. Its a no-op
      if not used.
      the patch follows the below algorithm:
        - presence of reason bits set indicates protodown
          is in use
        - user can set protodown and protodown reason in a
          single or multiple setlink operations
        - setlink operation to clear protodown, will return -EBUSY
          if there are active protodown reason bits
        - reason is not included in link dumps if not used
      
      example with patched iproute2:
      $cat /etc/iproute2/protodown_reasons.d/r.conf
      0 mlag
      1 evpn
      2 vrrp
      3 psecurity
      
      $ip link set dev vxlan0 protodown on protodown_reason vrrp on
      $ip link set dev vxlan0 protodown_reason mlag on
      $ip link show
      14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
      DEFAULT group default qlen 1000
          link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>
      
      $ip link set dev vxlan0 protodown_reason mlag off
      $ip link set dev vxlan0 protodown off protodown_reason vrrp off
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      829eb208
    • Y
      tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS · 48040793
      Yousuk Seung 提交于
      This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
      the earliest departure time(EDT) of the timestamped skb. By tracking EDT
      values of the skb from different timestamps, we can observe when and how
      much the value changed. This allows to measure the precise delay
      injected on the sender host e.g. by a bpf-base throttler.
      Signed-off-by: NYousuk Seung <ysseung@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48040793
  7. 31 7月, 2020 7 次提交
  8. 30 7月, 2020 5 次提交
    • C
      PM / devfreq: Add support delayed timer for polling mode · 4dc3bab8
      Chanwoo Choi 提交于
      Until now, the devfreq driver using polling mode like simple_ondemand
      governor have used only deferrable timer for reducing the redundant
      power consumption. It reduces the CPU wake-up from idle due to polling mode
      which check the status of Non-CPU device.
      
      But, it has a problem for Non-CPU device like DMC device with DMA operation.
      Some Non-CPU device need to do monitor continuously regardless of CPU state
      in order to decide the proper next status of Non-CPU device.
      
      So, add support the delayed timer for polling mode to support
      the repetitive monitoring. The devfreq driver and user can select
      the kind of timer on either deferrable and delayed timer.
      
      For example, change the timer type of DMC device
      based on Exynos5422-based Odroid-XU3 as following:
      
      - If want to use deferrable timer as following:
      echo deferrable > /sys/class/devfreq/10c20000.memory-controller/timer
      
      - If want to use delayed timer as following:
      echo delayed > /sys/class/devfreq/10c20000.memory-controller/timer
      Reviewed-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Reviewed-by: NLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
      4dc3bab8
    • A
      driver core: add device probe log helper · a787e540
      Andrzej Hajda 提交于
      During probe every time driver gets resource it should usually check for
      error printk some message if it is not -EPROBE_DEFER and return the error.
      This pattern is simple but requires adding few lines after any resource
      acquisition code, as a result it is often omitted or implemented only
      partially.
      dev_err_probe helps to replace such code sequences with simple call,
      so code:
      	if (err != -EPROBE_DEFER)
      		dev_err(dev, ...);
      	return err;
      becomes:
      	return dev_err_probe(dev, err, ...);
      Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
      Reviewed-by: NRafael J. Wysocki <rafael@kernel.org>
      Reviewed-by: NMark Brown <broonie@kernel.org>
      Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
      Link: https://lore.kernel.org/r/20200713144324.23654-2-a.hajda@samsung.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a787e540
    • L
      random32: remove net_rand_state from the latent entropy gcc plugin · 83bdc727
      Linus Torvalds 提交于
      It turns out that the plugin right now ends up being really unhappy
      about the change from 'static' to 'extern' storage that happened in
      commit f227e3ec ("random32: update the net random state on interrupt
      and activity").
      
      This is probably a trivial fix for the latent_entropy plugin, but for
      now, just remove net_rand_state from the list of things the plugin
      worries about.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83bdc727
    • W
      random32: update the net random state on interrupt and activity · f227e3ec
      Willy Tarreau 提交于
      This modifies the first 32 bits out of the 128 bits of a random CPU's
      net_rand_state on interrupt or CPU activity to complicate remote
      observations that could lead to guessing the network RNG's internal
      state.
      
      Note that depending on some network devices' interrupt rate moderation
      or binding, this re-seeding might happen on every packet or even almost
      never.
      
      In addition, with NOHZ some CPUs might not even get timer interrupts,
      leaving their local state rarely updated, while they are running
      networked processes making use of the random state.  For this reason, we
      also perform this update in update_process_times() in order to at least
      update the state when there is user or system activity, since it's the
      only case we care about.
      Reported-by: NAmit Klein <aksecurity@gmail.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f227e3ec
    • N
      cpuidle: change enter_s2idle() prototype · efe97112
      Neal Liu 提交于
      Control Flow Integrity(CFI) is a security mechanism that disallows
      changes to the original control flow graph of a compiled binary,
      making it significantly harder to perform such attacks.
      
      init_state_node() assign same function callback to different
      function pointer declarations.
      
      static int init_state_node(struct cpuidle_state *idle_state,
                                 const struct of_device_id *matches,
                                 struct device_node *state_node) { ...
              idle_state->enter = match_id->data; ...
              idle_state->enter_s2idle = match_id->data; }
      
      Function declarations:
      
      struct cpuidle_state { ...
              int (*enter) (struct cpuidle_device *dev,
                            struct cpuidle_driver *drv,
                            int index);
      
              void (*enter_s2idle) (struct cpuidle_device *dev,
                                    struct cpuidle_driver *drv,
                                    int index); };
      
      In this case, either enter() or enter_s2idle() would cause CFI check
      failed since they use same callee.
      
      Align function prototype of enter() since it needs return value for
      some use cases. The return value of enter_s2idle() is no
      need currently.
      Signed-off-by: NNeal Liu <neal.liu@mediatek.com>
      Reviewed-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      efe97112
  9. 29 7月, 2020 12 次提交