1. 17 10月, 2019 1 次提交
  2. 10 10月, 2019 1 次提交
  3. 07 10月, 2019 1 次提交
  4. 06 10月, 2019 4 次提交
  5. 05 10月, 2019 2 次提交
    • A
      net, uapi: fix -Wpointer-arith warnings · d6547f2a
      Alexey Dobriyan 提交于
      Add casts to fix these warnings:
      
      ./usr/include/linux/netfilter_arp/arp_tables.h:200:19: error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
      ./usr/include/linux/netfilter_bridge/ebtables.h:197:19: error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
      ./usr/include/linux/netfilter_ipv4/ip_tables.h:223:19: error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
      ./usr/include/linux/netfilter_ipv6/ip6_tables.h:263:19: error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
      ./usr/include/linux/tipc_config.h:310:28: error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
      ./usr/include/linux/tipc_config.h:410:24: error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
      ./usr/include/linux/virtio_ring.h:170:16: error: pointer of type 'void *' used in arithmetic [-Werror=pointer-arith]
      
      Those are theoretical probably but kernel doesn't control compiler flags
      in userspace.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6547f2a
    • J
      net: devlink: allow to change namespaces during reload · 070c63f2
      Jiri Pirko 提交于
      All devlink instances are created in init_net and stay there for a
      lifetime. Allow user to be able to move devlink instances into
      namespaces during devlink reload operation. That ensures proper
      re-instantiation of driver objects, including netdevices.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      070c63f2
  6. 04 10月, 2019 2 次提交
  7. 03 10月, 2019 1 次提交
  8. 02 10月, 2019 1 次提交
  9. 01 10月, 2019 2 次提交
  10. 28 9月, 2019 1 次提交
    • J
      ptp: correctly disable flags on old ioctls · 2df4de16
      Jacob Keller 提交于
      Commit 41560658 ("PTP: introduce new versions of IOCTLs",
      2019-09-13) introduced new versions of the PTP ioctls which actually
      validate that the flags are acceptable values.
      
      As part of this, it cleared the flags value using a bitwise
      and+negation, in an attempt to prevent the old ioctl from accidentally
      enabling new features.
      
      This is incorrect for a couple of reasons. First, it results in
      accidentally preventing previously working flags on the request ioctl.
      By clearing the "valid" flags, we now no longer allow setting the
      enable, rising edge, or falling edge flags.
      
      Second, if we add new additional flags in the future, they must not be
      set by the old ioctl. (Since the flag wasn't checked before, we could
      potentially break userspace programs which sent garbage flag data.
      
      The correct way to resolve this is to check for and clear all but the
      originally valid flags.
      
      Create defines indicating which flags are correctly checked and
      interpreted by the original ioctls. Use these to clear any bits which
      will not be correctly interpreted by the original ioctls.
      
      In the future, new flags must be added to the VALID_FLAGS macros, but
      *not* to the V1_VALID_FLAGS macros. In this way, new features may be
      exposed over the v2 ioctls, but without breaking previous userspace
      which happened to not clear the flags value properly. The old ioctl will
      continue to behave the same way, while the new ioctl gains the benefit
      of using the flags fields.
      
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Christopher Hall <christopher.s.hall@intel.com>
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2df4de16
  11. 26 9月, 2019 2 次提交
  12. 25 9月, 2019 1 次提交
    • M
      netfilter: ebtables: use __u8 instead of uint8_t in uapi header · 20ff1cb5
      Masahiro Yamada 提交于
      When CONFIG_UAPI_HEADER_TEST=y, exported headers are compile-tested to
      make sure they can be included from user-space.
      
      Currently, linux/netfilter_bridge/ebtables.h is excluded from the test
      coverage. To make it join the compile-test, we need to fix the build
      errors attached below.
      
      For a case like this, we decided to use __u{8,16,32,64} variable types
      in this discussion:
      
        https://lkml.org/lkml/2019/6/5/18
      
      Build log:
      
        CC      usr/include/linux/netfilter_bridge/ebtables.h.s
      In file included from <command-line>:32:0:
      ./usr/include/linux/netfilter_bridge/ebtables.h:126:4: error: unknown type name ‘uint8_t’
          uint8_t revision;
          ^~~~~~~
      ./usr/include/linux/netfilter_bridge/ebtables.h:139:4: error: unknown type name ‘uint8_t’
          uint8_t revision;
          ^~~~~~~
      ./usr/include/linux/netfilter_bridge/ebtables.h:152:4: error: unknown type name ‘uint8_t’
          uint8_t revision;
          ^~~~~~~
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      20ff1cb5
  13. 24 9月, 2019 1 次提交
  14. 20 9月, 2019 1 次提交
  15. 19 9月, 2019 3 次提交
    • A
      bpf: fix BTF limits · a0791f0d
      Alexei Starovoitov 提交于
      vmlinux BTF has more than 64k types.
      Its string section is also at the offset larger than 64k.
      Adjust both limits to make in-kernel BTF verifier successfully parse in-kernel BTF.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a0791f0d
    • S
      virtio-fs: add virtiofs filesystem · a62a8ef9
      Stefan Hajnoczi 提交于
      Add a basic file system module for virtio-fs.  This does not yet contain
      shared data support between host and guest or metadata coherency speedups.
      However it is already significantly faster than virtio-9p.
      
      Design Overview
      ===============
      
      With the goal of designing something with better performance and local file
      system semantics, a bunch of ideas were proposed.
      
       - Use fuse protocol (instead of 9p) for communication between guest and
         host.  Guest kernel will be fuse client and a fuse server will run on
         host to serve the requests.
      
       - For data access inside guest, mmap portion of file in QEMU address space
         and guest accesses this memory using dax.  That way guest page cache is
         bypassed and there is only one copy of data (on host).  This will also
         enable mmap(MAP_SHARED) between guests.
      
       - For metadata coherency, there is a shared memory region which contains
         version number associated with metadata and any guest changing metadata
         updates version number and other guests refresh metadata on next access.
         This is yet to be implemented.
      
      How virtio-fs differs from existing approaches
      ==============================================
      
      The unique idea behind virtio-fs is to take advantage of the co-location of
      the virtual machine and hypervisor to avoid communication (vmexits).
      
      DAX allows file contents to be accessed without communication with the
      hypervisor.  The shared memory region for metadata avoids communication in
      the common case where metadata is unchanged.
      
      By replacing expensive communication with cheaper shared memory accesses,
      we expect to achieve better performance than approaches based on network
      file system protocols.  In addition, this also makes it easier to achieve
      local file system semantics (coherency).
      
      These techniques are not applicable to network file system protocols since
      the communications channel is bypassed by taking advantage of shared memory
      on a local machine.  This is why we decided to build virtio-fs rather than
      focus on 9P or NFS.
      
      Caching Modes
      =============
      
      Like virtio-9p, different caching modes are supported which determine the
      coherency level as well.  The “cache=FOO” and “writeback” options control
      the level of coherence between the guest and host filesystems.
      
       - cache=none
         metadata, data and pathname lookup are not cached in guest.  They are
         always fetched from host and any changes are immediately pushed to host.
      
       - cache=always
         metadata, data and pathname lookup are cached in guest and never expire.
      
       - cache=auto
         metadata and pathname lookup cache expires after a configured amount of
         time (default is 1 second).  Data is cached while the file is open
         (close to open consistency).
      
       - writeback/no_writeback
         These options control the writeback strategy.  If writeback is disabled,
         then normal writes will immediately be synchronized with the host fs.
         If writeback is enabled, then writes may be cached in the guest until
         the file is closed or an fsync(2) performed.  This option has no effect
         on mmap-ed writes or writes going through the DAX mechanism.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a62a8ef9
    • J
      io_uring: IORING_OP_TIMEOUT support · 5262f567
      Jens Axboe 提交于
      There's been a few requests for functionality similar to io_getevents()
      and epoll_wait(), where the user can specify a timeout for waiting on
      events. I deliberately did not add support for this through the system
      call initially to avoid overloading the args, but I can see that the use
      cases for this are valid.
      
      This adds support for IORING_OP_TIMEOUT. If a user wants to get woken
      when waiting for events, simply submit one of these timeout commands
      with your wait call (or before). This ensures that the application
      sleeping on the CQ ring waiting for events will get woken. The timeout
      command is passed in as a pointer to a struct timespec. Timeouts are
      relative. The timeout command also includes a way to auto-cancel after
      N events has passed.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5262f567
  16. 17 9月, 2019 2 次提交
    • A
      ethtool: implement Energy Detect Powerdown support via phy-tunable · 9f2f13f4
      Alexandru Ardelean 提交于
      The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
      this feature is common across other PHYs (like EEE), and defining
      `ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
      
      The way EDPD works, is that the RX block is put to a lower power mode,
      except for link-pulse detection circuits. The TX block is also put to low
      power mode, but the PHY wakes-up periodically to send link pulses, to avoid
      lock-ups in case the other side is also in EDPD mode.
      
      Currently, there are 2 PHY drivers that look like they could use this new
      PHY tunable feature: the `adin` && `micrel` PHYs.
      
      The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
      default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
      datasheet does not mention whether they can be disabled, but mentions that
      they can modified.
      
      The way this change is structured, is similar to the PHY tunable downshift
      control:
      * a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default
        TX interval; some PHYs could specify a certain value that makes sense
      * `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
      * `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD
      
      As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1
      millisecond, which should cover a reasonable range of intervals:
       - from 1 millisecond, which does not sound like much of a power-saver
       - to ~65 seconds which is quite a lot to wait for a link to come up when
         plugging a cable
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NAlexandru Ardelean <alexandru.ardelean@analog.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f2f13f4
    • V
      taprio: Add support for hardware offloading · 9c66d156
      Vinicius Costa Gomes 提交于
      This allows taprio to offload the schedule enforcement to capable
      network cards, resulting in more precise windows and less CPU usage.
      
      The gate mask acts on traffic classes (groups of queues of same
      priority), as specified in IEEE 802.1Q-2018, and following the existing
      taprio and mqprio semantics.
      It is up to the driver to perform conversion between tc and individual
      netdev queues if for some reason it needs to make that distinction.
      
      Full offload is requested from the network interface by specifying
      "flags 2" in the tc qdisc creation command, which in turn corresponds to
      the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit.
      
      The important detail here is the clockid which is implicitly /dev/ptpN
      for full offload, and hence not configurable.
      
      A reference counting API is added to support the use case where Ethernet
      drivers need to keep the taprio offload structure locally (i.e. they are
      a multi-port switch driver, and configuring a port depends on the
      settings of other ports as well). The refcount_t variable is kept in a
      private structure (__tc_taprio_qopt_offload) and not exposed to drivers.
      
      In the future, the private structure might also be expanded with a
      backpointer to taprio_sched *q, to implement the notification system
      described in the patch (of when admin became oper, or an error occurred,
      etc, so the offload can be monitored with 'tc qdisc show').
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c66d156
  17. 16 9月, 2019 3 次提交
  18. 14 9月, 2019 2 次提交
  19. 13 9月, 2019 2 次提交
  20. 12 9月, 2019 2 次提交
    • M
      fuse: reserve byteswapped init opcodes · 501ae8ec
      Michael S. Tsirkin 提交于
      virtio fs tunnels fuse over a virtio channel.  One issue is two sides might
      be speaking different endian-ness. To detects this, host side looks at the
      opcode value in the FUSE_INIT command.  Works fine at the moment but might
      fail if a future version of fuse will use such an opcode for
      initialization.  Let's reserve this opcode so we remember and don't do
      this.
      
      Same for CUSE_INIT.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      501ae8ec
    • D
      fuse: reserve values for mapping protocol · c4bb667e
      Dr. David Alan Gilbert 提交于
      SETUPMAPPING is a command for use with 'virtiofsd', a fuse-over-virtio
      implementation; it may find use in other fuse impelementations as well in
      which the kernel does not have access to the address space of the daemon
      directly.
      
      A SETUPMAPPING operation causes a section of a file to be mapped into a
      memory window visible to the kernel.  The offsets in the file and the
      window are defined by the kernel performing the operation.
      
      The daemon may reject the request, for reasons including permissions and
      limited resources.
      
      When a request perfectly overlaps a previous mapping, the previous mapping
      is replaced.  When a mapping partially overlaps a previous mapping, the
      previous mapping is split into one or two smaller mappings.
      
      REMOVEMAPPING is the complement to SETUPMAPPING; it unmaps a range of
      mapped files from the window visible to the kernel.
      
      The map_alignment field communicates the alignment constraint for
      FUSE_SETUPMAPPING/FUSE_REMOVEMAPPING and allows the daemon to constrain the
      addresses and file offsets chosen by the kernel.
      Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c4bb667e
  21. 11 9月, 2019 5 次提交