1. 08 2月, 2019 7 次提交
  2. 05 2月, 2019 1 次提交
    • L
      net: phy: fixed-phy: Drop GPIO from fixed_phy_add() · 5468e82f
      Linus Walleij 提交于
      All users of the fixed_phy_add() pass -1 as GPIO number
      to the fixed phy driver, and all users of fixed_phy_register()
      pass -1 as GPIO number as well, except for the device
      tree MDIO bus.
      
      Any new users should create a proper device and pass the
      GPIO as a descriptor associated with the device so delete
      the GPIO argument from the calls and drop the code looking
      requesting a GPIO in fixed_phy_add().
      
      In fixed phy_register(), investigate the "fixed-link"
      node and pick the GPIO descriptor from "link-gpios" if
      this property exists. Move the corresponding code out
      of of_mdio.c as the fixed phy code anyways requires
      OF to be in use.
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5468e82f
  3. 04 2月, 2019 11 次提交
    • J
      netdevice.h: Add __cold to netdev_<level> logging functions · ce3fdb69
      Joe Perches 提交于
      Add __cold to the netdev_<level> logging functions similar to
      the use of __cold in the generic printk function.
      
      Using __cold moves all the netdev_<level> logging functions
      out-of-line possibly improving code locality and runtime
      performance.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce3fdb69
    • F
      net: Fix ip_mc_{dec,inc}_group allocation context · 9fb20801
      Florian Fainelli 提交于
      After 4effd28c ("bridge: join all-snoopers multicast address"), I
      started seeing the following sleep in atomic warnings:
      
      [   26.763893] BUG: sleeping function called from invalid context at mm/slab.h:421
      [   26.771425] in_atomic(): 1, irqs_disabled(): 0, pid: 1658, name: sh
      [   26.777855] INFO: lockdep is turned off.
      [   26.781916] CPU: 0 PID: 1658 Comm: sh Not tainted 5.0.0-rc4 #20
      [   26.787943] Hardware name: BCM97278SV (DT)
      [   26.792118] Call trace:
      [   26.794645]  dump_backtrace+0x0/0x170
      [   26.798391]  show_stack+0x24/0x30
      [   26.801787]  dump_stack+0xa4/0xe4
      [   26.805182]  ___might_sleep+0x208/0x218
      [   26.809102]  __might_sleep+0x78/0x88
      [   26.812762]  kmem_cache_alloc_trace+0x64/0x28c
      [   26.817301]  igmp_group_dropped+0x150/0x230
      [   26.821573]  ip_mc_dec_group+0x1b0/0x1f8
      [   26.825585]  br_ip4_multicast_leave_snoopers.isra.11+0x174/0x190
      [   26.831704]  br_multicast_toggle+0x78/0xcc
      [   26.835887]  store_bridge_parm+0xc4/0xfc
      [   26.839894]  multicast_snooping_store+0x3c/0x4c
      [   26.844517]  dev_attr_store+0x44/0x5c
      [   26.848262]  sysfs_kf_write+0x50/0x68
      [   26.852006]  kernfs_fop_write+0x14c/0x1b4
      [   26.856102]  __vfs_write+0x60/0x190
      [   26.859668]  vfs_write+0xc8/0x168
      [   26.863059]  ksys_write+0x70/0xc8
      [   26.866449]  __arm64_sys_write+0x24/0x30
      [   26.870458]  el0_svc_common+0xa0/0x11c
      [   26.874291]  el0_svc_handler+0x38/0x70
      [   26.878120]  el0_svc+0x8/0xc
      
      while toggling the bridge's multicast_snooping attribute dynamically.
      
      Pass a gfp_t down to igmpv3_add_delrec(), introduce
      __igmp_group_dropped() and introduce __ip_mc_dec_group() to take a gfp_t
      argument.
      
      Similarly introduce ____ip_mc_inc_group() and __ip_mc_inc_group() to
      allow caller to specify gfp_t.
      
      IPv6 part of the patch appears fine.
      
      Fixes: 4effd28c ("bridge: join all-snoopers multicast address")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fb20801
    • J
      net: devlink: report cell size of shared buffers · bff5731d
      Jakub Kicinski 提交于
      Shared buffer allocation is usually done in cell increments.
      Drivers will either round up the allocation or refuse the
      configuration if it's not an exact multiple of cell size.
      Drivers know exactly the cell size of shared buffer, so help
      out users by providing this information in dumps.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bff5731d
    • D
      sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW · a9beb86a
      Deepa Dinamani 提交于
      Add new socket timeout options that are y2038 safe.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: ccaulfie@redhat.com
      Cc: davem@davemloft.net
      Cc: deller@gmx.de
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: cluster-devel@redhat.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9beb86a
    • D
      socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes · 45bdc661
      Deepa Dinamani 提交于
      SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
      as the time format. struct timeval is not y2038 safe.
      The subsequent patches in the series add support for new socket
      timeout options with _NEW suffix that will use y2038 safe
      data structures. Although the existing struct timeval layout
      is sufficiently wide to represent timeouts, because of the way
      libc will interpret time_t based on user defined flag, these
      new flags provide a way of having a structure that is the same
      for all architectures consistently.
      Rename the existing options with _OLD suffix forms so that the
      right option is enabled for userspace applications according
      to the architecture and time_t definition of libc.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: ccaulfie@redhat.com
      Cc: deller@gmx.de
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: cluster-devel@redhat.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45bdc661
    • D
      socket: Add SO_TIMESTAMPING_NEW · 9718475e
      Deepa Dinamani 提交于
      Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
      This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
      for all architectures.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: chris@zankel.net
      Cc: fenghua.yu@intel.com
      Cc: rth@twiddle.net
      Cc: tglx@linutronix.de
      Cc: ubraun@linux.ibm.com
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9718475e
    • D
      socket: Add SO_TIMESTAMP[NS]_NEW · 887feae3
      Deepa Dinamani 提交于
      Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
      socket timestamp options.
      These are the y2038 safe versions of the SO_TIMESTAMP_OLD
      and SO_TIMESTAMPNS_OLD for all architectures.
      
      Note that the format of scm_timestamping.ts[0] is not changed
      in this patch.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: jejb@parisc-linux.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-rdma@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      887feae3
    • D
      socket: Add struct __kernel_sock_timeval · 98bb03c8
      Deepa Dinamani 提交于
      The new type is meant to be used as a y2038 safe structure
      to be used as part of cmsg data.
      Presently the SO_TIMESTAMP socket option uses struct timeval
      for timestamps. This is not y2038 safe.
      Subsequent patches in the series add new y2038 safe socket
      option to be used in the place of SO_TIMESTAMP_OLD.
      struct __kernel_sock_timeval will be used as the timestamp
      format at that time.
      
      struct __kernel_sock_timeval also maintains the same layout
      across 32 bit and 64 bit ABIs.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98bb03c8
    • D
      socket: Use old_timeval types for socket timestamps · 13c6ee2a
      Deepa Dinamani 提交于
      As part of y2038 solution, all internal uses of
      struct timeval are replaced by struct __kernel_old_timeval
      and struct compat_timeval by struct old_timeval32.
      Make socket timestamps use these new types.
      
      This is mainly to be able to verify that the kernel build
      is y2038 safe when such non y2038 safe types are not
      supported anymore.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: isdn@linux-pingi.de
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13c6ee2a
    • D
      arch: sparc: Override struct __kernel_old_timeval · bcb3fc32
      Deepa Dinamani 提交于
      struct __kernel_old_timeval is supposed to have the same
      layout as struct timeval. But, it was inadvarently missed
      that __kernel_suseconds has a different definition for
      sparc64.
      Provide an asm-specific override that fixes it.
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcb3fc32
    • D
      sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD · 7f1bc6e9
      Deepa Dinamani 提交于
      SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
      way they are currently defined, are not y2038 safe.
      Subsequent patches in the series add new y2038 safe versions
      of these options which provide 64 bit timestamps on all
      architectures uniformly.
      Hence, rename existing options with OLD tag suffixes.
      
      Also note that kernel will not use the untagged SO_TIMESTAMP*
      and SCM_TIMESTAMP* options internally anymore.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: deller@gmx.de
      Cc: dhowells@redhat.com
      Cc: jejb@parisc-linux.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: linux-afs@lists.infradead.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-rdma@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f1bc6e9
  4. 02 2月, 2019 10 次提交
    • J
      ethtool: add compat for devlink info · ddb6e99e
      Jakub Kicinski 提交于
      If driver did not fill the fw_version field, try to call into
      the new devlink get_info op and collect the versions that way.
      We assume ethtool was always reporting running versions.
      
      v4:
       - use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot).
      v3 (Jiri):
       - do a dump and then parse it instead of special handling;
       - concatenate all versions (well, all that fit :)).
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddb6e99e
    • J
      devlink: add generic info version names · 785bd550
      Jakub Kicinski 提交于
      Add defines and docs for generic info versions.
      
      v3:
       - add docs;
       - separate patch (Jiri).
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      785bd550
    • J
      devlink: add version reporting to devlink info API · fc6fae7d
      Jakub Kicinski 提交于
      ethtool -i has a few fixed-size fields which can be used to report
      firmware version and expansion ROM version. Unfortunately, modern
      hardware has more firmware components. There is usually some
      datapath microcode, management controller, PXE drivers, and a
      CPLD load. Running ethtool -i on modern controllers reveals the
      fact that vendors cram multiple values into firmware version field.
      
      Here are some examples from systems I could lay my hands on quickly:
      
      tg3:  "FFV20.2.17 bc 5720-v1.39"
      i40e: "6.01 0x800034a4 1.1747.0"
      nfp:  "0.0.3.5 0.25 sriov-2.1.16 nic"
      
      Add a new devlink API to allow retrieving multiple versions, and
      provide user-readable name for those versions.
      
      While at it break down the versions into three categories:
       - fixed - this is the board/fixed component version, usually vendors
                 report information like the board version in the PCI VPD,
                 but it will benefit from naming and common API as well;
       - running - this is the running firmware version;
       - stored - this is firmware in the flash, after firmware update
                  this value will reflect the flashed version, while the
                  running version may only be updated after reboot.
      
      v3:
       - add per-type helpers instead of using the special argument (Jiri).
      RFCv2:
       - remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now
         versions are mixed with other info attrs)l
       - have the driver report versions from the same callback as
         other info.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc6fae7d
    • J
      devlink: add device information API · f9cf2288
      Jakub Kicinski 提交于
      ethtool -i has served us well for a long time, but its showing
      its limitations more and more. The device information should
      also be reported per device not per-netdev.
      
      Lay foundation for a simple devlink-based way of reading device
      info. Add driver name and device serial number as initial pieces
      of information exposed via this new API.
      
      v3:
       - rename helpers (Jiri);
       - rename driver name attr (Jiri);
       - remove double spacing in commit message (Jiri).
      RFC v2:
       - wrap the skb into an opaque structure (Jiri);
       - allow the serial number of be any length (Jiri & Andrew);
       - add driver name (Jonathan).
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9cf2288
    • D
      net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESS · 5b053e12
      Dave Watson 提交于
      Currently we don't zerocopy if the crypto framework async bit is set.
      However some crypto algorithms (such as x86 AESNI) support async,
      but in the context of sendmsg, will never run asynchronously.  Instead,
      check for actual EINPROGRESS return code before assuming algorithm is
      async.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b053e12
    • D
      net: tls: Add tls 1.3 support · 130b392c
      Dave Watson 提交于
      TLS 1.3 has minor changes from TLS 1.2 at the record layer.
      
      * Header now hardcodes the same version and application content type in
        the header.
      * The real content type is appended after the data, before encryption (or
        after decryption).
      * The IV is xored with the sequence number, instead of concatinating four
        bytes of IV with the explicit IV.
      * Zero-padding:  No exlicit length is given, we search backwards from the
        end of the decrypted data for the first non-zero byte, which is the
        content type.  Currently recv supports reading zero-padding, but there
        is no way for send to add zero padding.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      130b392c
    • D
      net: tls: Refactor tls aad space size calculation · a2ef9b6a
      Dave Watson 提交于
      TLS 1.3 has a different AAD size, use a variable in the code to
      make TLS 1.3 support easy.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2ef9b6a
    • D
      net: tls: Support 256 bit keys · fb99bce7
      Dave Watson 提交于
      Wire up support for 256 bit keys from the setsockopt to the crypto
      framework
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb99bce7
    • A
      bpf: introduce BPF_F_LOCK flag · 96049f3a
      Alexei Starovoitov 提交于
      Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
      and for map_update() helper function.
      In all these cases take a lock of existing element (which was provided
      in BTF description) before copying (in or out) the rest of map value.
      
      Implementation details that are part of uapi:
      
      Array:
      The array map takes the element lock for lookup/update.
      
      Hash:
      hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
      If old element exists it takes the element lock and updates the element in place.
      If element doesn't exist it allocates new one and inserts into hash table
      while holding the bucket lock.
      In rare case the hashmap has to take both the bucket lock and the element lock
      to update old value in place.
      
      Cgroup local storage:
      It is similar to array. update in place and lookup are done with lock taken.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      96049f3a
    • A
      bpf: introduce bpf_spin_lock · d83525ca
      Alexei Starovoitov 提交于
      Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
      bpf program serialize access to other variables.
      
      Example:
      struct hash_elem {
          int cnt;
          struct bpf_spin_lock lock;
      };
      struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
      if (val) {
          bpf_spin_lock(&val->lock);
          val->cnt++;
          bpf_spin_unlock(&val->lock);
      }
      
      Restrictions and safety checks:
      - bpf_spin_lock is only allowed inside HASH and ARRAY maps.
      - BTF description of the map is mandatory for safety analysis.
      - bpf program can take one bpf_spin_lock at a time, since two or more can
        cause dead locks.
      - only one 'struct bpf_spin_lock' is allowed per map element.
        It drastically simplifies implementation yet allows bpf program to use
        any number of bpf_spin_locks.
      - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
      - bpf program must bpf_spin_unlock() before return.
      - bpf program can access 'struct bpf_spin_lock' only via
        bpf_spin_lock()/bpf_spin_unlock() helpers.
      - load/store into 'struct bpf_spin_lock lock;' field is not allowed.
      - to use bpf_spin_lock() helper the BTF description of map value must be
        a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
        Nested lock inside another struct is not allowed.
      - syscall map_lookup doesn't copy bpf_spin_lock field to user space.
      - syscall map_update and program map_update do not update bpf_spin_lock field.
      - bpf_spin_lock cannot be on the stack or inside networking packet.
        bpf_spin_lock can only be inside HASH or ARRAY map value.
      - bpf_spin_lock is available to root only and to all program types.
      - bpf_spin_lock is not allowed in inner maps of map-in-map.
      - ld_abs is not allowed inside spin_lock-ed region.
      - tracing progs and socket filter progs cannot use bpf_spin_lock due to
        insufficient preemption checks
      
      Implementation details:
      - cgroup-bpf class of programs can nest with xdp/tc programs.
        Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
        Other solutions to avoid nested bpf_spin_lock are possible.
        Like making sure that all networking progs run with softirq disabled.
        spin_lock_irqsave is the simplest and doesn't add overhead to the
        programs that don't use it.
      - arch_spinlock_t is used when its implemented as queued_spin_lock
      - archs can force their own arch_spinlock_t
      - on architectures where queued_spin_lock is not available and
        sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
      - presence of bpf_spin_lock inside map value could have been indicated via
        extra flag during map_create, but specifying it via BTF is cleaner.
        It provides introspection for map key/value and reduces user mistakes.
      
      Next steps:
      - allow bpf_spin_lock in other map types (like cgroup local storage)
      - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
        to request kernel to grab bpf_spin_lock before rewriting the value.
        That will serialize access to map elements.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d83525ca
  5. 01 2月, 2019 5 次提交
  6. 31 1月, 2019 4 次提交
  7. 30 1月, 2019 2 次提交