1. 27 2月, 2018 10 次提交
  2. 26 2月, 2018 2 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · ba6056a4
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-02-26
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Various improvements for BPF kselftests: i) skip unprivileged tests
         when kernel.unprivileged_bpf_disabled sysctl knob is set, ii) count
         the number of skipped tests from unprivileged, iii) when a test case
         had an unexpected error then print the actual but also the unexpected
         one for better comparison, from Joe.
      
      2) Add a sample program for collecting CPU state statistics with regards
         to how long the CPU resides in cstate and pstate levels. Based on
         cpu_idle and cpu_frequency trace points, from Leo.
      
      3) Various x64 BPF JIT optimizations to further shrink the generated
         image size in order to make it more icache friendly. When tested on
         the Cilium generated programs, image size reduced by approx 4-5% in
         best case mainly due to how LLVM emits unsigned 32 bit constants,
         from Daniel.
      
      4) Improvements and fixes on the BPF sockmap sample programs: i) fix
         the sockmap's Makefile to include nlattr.o for libbpf, ii) detach
         the sock ops programs from the cgroup before exit, from Prashant.
      
      5) Avoid including xdp.h in filter.h by just forward declaring the
         struct xdp_rxq_info in filter.h, from Jesper.
      
      6) Fix the BPF kselftests Makefile for cgroup_helpers.c by only declaring
         it a dependency for test_dev_cgroup.c but not every other test case
         where it is not needed, from Jesper.
      
      7) Adjust rlimit RLIMIT_MEMLOCK for test_tcpbpf_user selftest since the
         default is insufficient for creating the 'global_map' used in the
         corresponding BPF program, from Yonghong.
      
      8) Likewise, for the xdp_redirect sample, Tushar ran into the same when
         invoking xdp_redirect and xdp_monitor at the same time, therefore
         in order to have the sample generically work bump the limit here,
         too. Fix from Tushar.
      
      9) Avoid an unnecessary NULL check in BPF_CGROUP_RUN_PROG_INET_SOCK()
         since sk is always guaranteed to be non-NULL, from Yafang.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba6056a4
    • L
      samples/bpf: Add program for CPU state statistics · c5350777
      Leo Yan 提交于
      CPU is active when have running tasks on it and CPUFreq governor can
      select different operating points (OPP) according to different workload;
      we use 'pstate' to present CPU state which have running tasks with one
      specific OPP.  On the other hand, CPU is idle which only idle task on
      it, CPUIdle governor can select one specific idle state to power off
      hardware logics; we use 'cstate' to present CPU idle state.
      
      Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish
      the duration statistics for every state.  Every time when CPU enters
      into or exits from idle states, the trace event 'cpu_idle' is recorded;
      trace event 'cpu_frequency' records the event for CPU OPP changing, so
      it's easily to know how long time the CPU stays in the specified OPP,
      and the CPU must be not in any idle state.
      
      This patch is to utilize the mentioned trace events for pstate and
      cstate statistics.  To achieve more accurate profiling data, the program
      uses below sequence to insure CPU running/idle time aren't missed:
      
      - Before profiling the user space program wakes up all CPUs for once, so
        can avoid to missing account time for CPU staying in idle state for
        long time; the program forces to set 'scaling_max_freq' to lowest
        frequency and then restore 'scaling_max_freq' to highest frequency,
        this can ensure the frequency to be set to lowest frequency and later
        after start to run workload the frequency can be easily to be changed
        to higher frequency;
      
      - User space program reads map data and update statistics for every 5s,
        so this is same with other sample bpf programs for avoiding big
        overload introduced by bpf program self;
      
      - When send signal to terminate program, the signal handler wakes up
        all CPUs, set lowest frequency and restore highest frequency to
        'scaling_max_freq'; this is exactly same with the first step so
        avoid to missing account CPU pstate and cstate time during last
        stage.  Finally it reports the latest statistics.
      
      The program has been tested on Hikey board with octa CA53 CPUs, below
      is one example for statistics result, the format mainly follows up
      Jesper Dangaard Brouer suggestion.
      
      Jesper reminds to 'get printf to pretty print with thousands separators
      use %' and setlocale(LC_NUMERIC, "en_US")', tried three different arm64
      GCC toolchains (5.4.0 20160609, 6.2.1 20161016, 6.3.0 20170516) but all
      of them cannot support printf flag character %' on arm64 platform, so go
      back print number without grouping mode.
      
      CPU states statistics:
      state(ms)  cstate-0    cstate-1    cstate-2    pstate-0    pstate-1    pstate-2    pstate-3    pstate-4
      CPU-0      767         6111        111863      561         31          756         853         190
      CPU-1      241         10606       107956      484         125         646         990         85
      CPU-2      413         19721       98735       636         84          696         757         89
      CPU-3      84          11711       79989       17516       909         4811        5773        341
      CPU-4      152         19610       98229       444         53          649         708         1283
      CPU-5      185         8781        108697      666         91          671         677         1365
      CPU-6      157         21964       95825       581         67          566         684         1284
      CPU-7      125         15238       102704      398         20          665         786         1197
      
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c5350777
  3. 24 2月, 2018 25 次提交
    • A
      Merge branch 'x86-jit' · 7d72637e
      Alexei Starovoitov 提交于
      Daniel Borkmann says:
      
      ====================
      Couple of minor improvements to the x64 JIT I had still around from
      pre merge window in order to shrink the image size further. Added
      test cases for kselftests too as well as running Cilium workloads on
      them w/o issues.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7d72637e
    • D
      bpf: add various jit test cases · 23d191a8
      Daniel Borkmann 提交于
      Add few test cases that check the rnu-time results under JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      23d191a8
    • D
      bpf, x64: save 5 bytes in prologue when ebpf insns came from cbpf · 08691752
      Daniel Borkmann 提交于
      While it's rather cumbersome to reduce prologue for cBPF->eBPF
      migrations wrt spill/fill for r15 which is callee saved register
      due to bpf_error path in bpf_jit.S that is both used by migrations
      as well as native eBPF, we can still trivially save 5 bytes in
      prologue for the former since tail calls can never be used there.
      cBPF->eBPF migrations also have their own custom prologue in BPF
      asm that xors A and X reg anyway, so it's fine we skip this here.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      08691752
    • D
      bpf, x64: save few bytes when mul is in alu32 · 4c38e2f3
      Daniel Borkmann 提交于
      Add a generic emit_mov_reg() helper in order to reuse it in BPF
      multiplication to load the src into rax, we can save a few bytes
      in alu32 while doing so.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      4c38e2f3
    • D
      bpf, x64: save several bytes when mul dest is r0/r3 anyway · d806a0cf
      Daniel Borkmann 提交于
      Instead of unconditionally performing push/pop on rax/rdx
      in case of multiplication, we can save a few bytes in case
      of dest register being either BPF r0 (rax) or r3 (rdx)
      since the result is written in there anyway.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d806a0cf
    • D
      bpf, x64: save several bytes by using mov over movabsq when possible · 6fe8b9c1
      Daniel Borkmann 提交于
      While analyzing some of the more complex BPF programs from Cilium,
      I found that LLVM generally prefers to emit LD_IMM64 instead of MOV32
      BPF instructions for loading unsigned 32-bit immediates into a
      register. Given we cannot change the current/stable LLVM versions
      that are already out there, lets optimize this case such that the
      JIT prefers to emit 'mov %eax, imm32' over 'movabsq %rax, imm64'
      whenever suitable in order to reduce the image size by 4-5 bytes per
      such load in the typical case, reducing image size on some of the
      bigger programs by up to 4%. emit_mov_imm32() and emit_mov_imm64()
      have been added as helpers.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      6fe8b9c1
    • D
      bpf, x64: save one byte per shl/shr/sar when imm is 1 · 88e69a1f
      Daniel Borkmann 提交于
      When we shift by one, we can use a different encoding where imm
      is not explicitly needed, which saves 1 byte per such op.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      88e69a1f
    • D
      f74290fd
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9cb9c07d
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix TTL offset calculation in mac80211 mesh code, from Peter Oh.
      
       2) Fix races with procfs in ipt_CLUSTERIP, from Cong Wang.
      
       3) Memory leak fix in lpm_trie BPF map code, from Yonghong Song.
      
       4) Need to use GFP_ATOMIC in BPF cpumap allocations, from Jason Wang.
      
       5) Fix potential deadlocks in netfilter getsockopt() code paths, from
          Paolo Abeni.
      
       6) Netfilter stackpointer size checks really are needed to validate
          user input, from Florian Westphal.
      
       7) Missing timer init in x_tables, from Paolo Abeni.
      
       8) Don't use WQ_MEM_RECLAIM in mac80211 hwsim, from Johannes Berg.
      
       9) When an ibmvnic device is brought down then back up again, it can be
          sent queue entries from a previous session, handle this properly
          instead of crashing. From Thomas Falcon.
      
      10) Fix TCP checksum on LRO buffers in mlx5e, from Gal Pressman.
      
      11) When we are dumping filters in cls_api, the output SKB is empty, and
          the filter we are dumping is too large for the space in the SKB, we
          should return -EMSGSIZE like other netlink dump operations do.
          Otherwise userland has no signal that is needs to increase the size
          of its read buffer. From Roman Kapl.
      
      12) Several XDP fixes for virtio_net, from Jesper Dangaard Brouer.
      
      13) Module refcount leak in netlink when a dump start fails, from Jason
          Donenfeld.
      
      14) Handle sub-optimal GSO sizes better in TCP BBR congestion control,
          from Eric Dumazet.
      
      15) Releasing bpf per-cpu arraymaps can take a long time, add a
          condtional scheduling point. From Eric Dumazet.
      
      16) Implement retpolines for tail calls in x64 and arm64 bpf JITs. From
          Daniel Borkmann.
      
      17) Fix page leak in gianfar driver, from Andy Spencer.
      
      18) Missed clearing of estimator scratch buffer, from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
        net_sched: gen_estimator: fix broken estimators based on percpu stats
        gianfar: simplify FCS handling and fix memory leak
        ipv6 sit: work around bogus gcc-8 -Wrestrict warning
        macvlan: fix use-after-free in macvlan_common_newlink()
        bpf, arm64: fix out of bounds access in tail call
        bpf, x64: implement retpoline for tail call
        rxrpc: Fix send in rxrpc_send_data_packet()
        net: aquantia: Fix error handling in aq_pci_probe()
        bpf: fix rcu lockdep warning for lpm_trie map_free callback
        bpf: add schedule points in percpu arrays management
        regulatory: add NUL to request alpha2
        ibmvnic: Fix early release of login buffer
        net/smc9194: Remove bogus CONFIG_MAC reference
        net: ipv4: Set addr_type in hash_keys for forwarded case
        tcp_bbr: better deal with suboptimal GSO
        smsc75xx: fix smsc75xx_set_features()
        netlink: put module reference if dump start fails
        selftests/bpf/test_maps: exit child process without error in ENOMEM case
        selftests/bpf: update gitignore with test_libbpf_open
        selftests/bpf: tcpbpf_kern: use in6_* macros from glibc
        ..
      9cb9c07d
    • L
      Merge branch 'fixes-v4.16-rc3' of... · 2eb02aa9
      Linus Torvalds 提交于
      Merge branch 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
      
      Pull security subsystem fixes from James Morris:
      
       - keys fixes via David Howells:
            "A collection of fixes for Linux keyrings, mostly thanks to Eric
             Biggers:
      
              - Fix some PKCS#7 verification issues.
      
              - Fix handling of unsupported crypto in X.509.
      
              - Fix too-large allocation in big_key"
      
       - Seccomp updates via Kees Cook:
            "These are fixes for the get_metadata interface that landed during
             -rc1. While the new selftest is strictly not a bug fix, I think
             it's in the same spirit of avoiding bugs"
      
       - an IMA build fix from Randy Dunlap
      
      * 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        integrity/security: fix digsig.c build error with header file
        KEYS: Use individual pages in big_key for crypto buffers
        X.509: fix NULL dereference when restricting key with unsupported_sig
        X.509: fix BUG_ON() when hash algorithm is unsupported
        PKCS#7: fix direct verification of SignerInfo signature
        PKCS#7: fix certificate blacklisting
        PKCS#7: fix certificate chain verification
        seccomp: add a selftest for get_metadata
        ptrace, seccomp: tweak get_metadata behavior slightly
        seccomp, ptrace: switch get_metadata types to arch independent
      2eb02aa9
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 65738c6b
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
       "arm64 and perf fixes:
      
         - build error when accessing MPIDR_HWID_BITMASK from .S
      
         - fix CTR_EL0 field definitions
      
         - remove/disable some kernel messages on user faults (unhandled
           signals, unimplemented syscalls)
      
         - fix kernel page fault in unwind_frame() with function graph tracing
      
         - fix perf sleeping while atomic errors when booting with ACPI"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: fix unwind_frame() for filtered out fn for function graph tracing
        arm64: Enforce BBM for huge IO/VMAP mappings
        arm64: perf: correct PMUVer probing
        arm_pmu: acpi: request IRQs up-front
        arm_pmu: note IRQs and PMUs per-cpu
        arm_pmu: explicitly enable/disable SPIs at hotplug
        arm_pmu: acpi: check for mismatched PPIs
        arm_pmu: add armpmu_alloc_atomic()
        arm_pmu: fold platform helpers into platform code
        arm_pmu: kill arm_pmu_platdata
        ARM: ux500: remove PMU IRQ bouncer
        arm64: __show_regs: Only resolve kernel symbols when running at EL1
        arm64: Remove unimplemented syscall log message
        arm64: Disable unhandled signal log messages by default
        arm64: cpufeature: Fix CTR_EL0 field definitions
        arm64: uaccess: Formalise types for access_ok()
        arm64: Fix compilation error while accessing MPIDR_HWID_BITMASK from .S files
      65738c6b
    • L
      Merge tag 'mips_fixes_4.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips · 2bd06ce7
      Linus Torvalds 提交于
      Pull MIPS fix from James Hogan:
       "A single MIPS fix for mismatching struct compat_flock, resulting in
        bus errors starting Firefox on Debian 8 since 4.13"
      
      * tag 'mips_fixes_4.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
        MIPS: Drop spurious __unused in struct compat_flock
      2bd06ce7
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk · 13f514be
      Linus Torvalds 提交于
      Pull printk fixlet from Petr Mladek:
       "People expect to see the real pointer value for %px.
      
        Let's substitute '(null)' only for the other %p? format modifiers that
        need to deference the pointer"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
        vsprintf: avoid misleading "(null)" for %px
      13f514be
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 938e1426
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Two bugfixes, one v4.16 regression fix, and two documentation fixes"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: designware: Consider SCL GPIO optional
        i2c: busses: i2c-sirf: Fix spelling: "formular" -> "formula".
        i2c: bcm2835: Set up the rising/falling edge delays
        i2c: i801: Add missing documentation entries for Braswell and Kaby Lake
        i2c: designware: must wait for enable
      938e1426
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 170e07bf
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "These are mostly fixes for problems with merge window code.
      
        In addition we have one doc update (alua) and two dead code removals
        (aiclib and octogon) a spurious assignment removal (csiostor) and a
        performance improvement for storvsc involving better interrupt
        spreading and increasing the command per lun handling"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla4xxx: skip error recovery in case of register disconnect.
        scsi: aacraid: fix shutdown crash when init fails
        scsi: qedi: Cleanup local str variable
        scsi: qedi: Fix truncation of CHAP name and secret
        scsi: qla2xxx: Fix incorrect handle for abort IOCB
        scsi: qla2xxx: Fix double free bug after firmware timeout
        scsi: storvsc: Increase cmd_per_lun for higher speed devices
        scsi: qla2xxx: Fix a locking imbalance in qlt_24xx_handle_els()
        scsi: scsi_dh: Document alua_rtpg_queue() arguments
        scsi: Remove Makefile entry for oktagon files
        scsi: aic7xxx: remove aiclib.c
        scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
        scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
        scsi: sym53c8xx_2: iterator underflow in sym_getsync()
        scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
        scsi: csiostor: remove redundant assignment to pointer 'ln'
        scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
        scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info
        scsi: qla2xxx: Fix memory corruption during hba reset test
        scsi: mpt3sas: fix an out of bound write
      170e07bf
    • D
      net: fib_rules: Add new attribute to set protocol · 1b71af60
      Donald Sharp 提交于
      For ages iproute2 has used `struct rtmsg` as the ancillary header for
      FIB rules and in the process set the protocol value to RTPROT_BOOT.
      Until ca56209a66 ("net: Allow a rule to track originating protocol")
      the kernel rules code ignored the protocol value sent from userspace
      and always returned 0 in notifications. To avoid incompatibility with
      existing iproute2, send the protocol as a new attribute.
      
      Fixes: cac56209 ("net: Allow a rule to track originating protocol")
      Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b71af60
    • L
      Merge tag 'drm-fixes-for-v4.16-rc3' of git://people.freedesktop.org/~airlied/linux · 8961ca44
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "A bunch of fixes for rc3:
      
        Exynos:
         - fixes for using monotonic timestamps
         - register definitions
         - removal of unused file
      
        ipu-v3L
         - minor changes
         - make some register arrays const+static
         - fix some leaks
      
        meson:
         - fix for vsync
      
        atomic:
         - fix for memory leak
      
        EDID parser:
         - add quirks for some more non-desktop devices
         - 6-bit panel fix.
      
        drm_mm:
         - fix a bug in the core drm mm hole handling
      
        cirrus:
         - fix lut loading regression
      
        Lastly there is a deadlock fix around runtime suspend for secondary
        GPUs.
      
        There was a deadlock between one thread trying to wait for a workqueue
        job to finish in the runtime suspend path, and the workqueue job it
        was waiting for in turn waiting for a runtime_get_sync to return.
      
        The fixes avoids it by not doing the runtime sync in the workqueue as
        then we always wait for all those tasks to complete before we runtime
        suspend"
      
      * tag 'drm-fixes-for-v4.16-rc3' of git://people.freedesktop.org/~airlied/linux: (25 commits)
        drm/tve200: fix kernel-doc documentation comment include
        drm/edid: quirk Sony PlayStation VR headset as non-desktop
        drm/edid: quirk Windows Mixed Reality headsets as non-desktop
        drm/edid: quirk Oculus Rift headsets as non-desktop
        drm/meson: fix vsync buffer update
        drm: Handle unexpected holes in color-eviction
        drm: exynos: Use proper macro definition for HDMI_I2S_PIN_SEL_1
        drm/exynos: remove exynos_drm_rotator.h
        drm/exynos: g2d: Delete an error message for a failed memory allocation in two functions
        drm/exynos: fix comparison to bitshift when dealing with a mask
        drm/exynos: g2d: use monotonic timestamps
        drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
        gpu: ipu-csi: add 10/12-bit grayscale support to mbus_code_to_bus_cfg
        gpu: ipu-cpmem: add 16-bit grayscale support to ipu_cpmem_set_image
        gpu: ipu-v3: prg: fix device node leak in ipu_prg_lookup_by_phandle
        gpu: ipu-v3: pre: fix device node leak in ipu_pre_lookup_by_phandle
        drm/amdgpu: Fix deadlock on runtime suspend
        drm/radeon: Fix deadlock on runtime suspend
        drm/nouveau: Fix deadlock on runtime suspend
        drm: Allow determining if current task is output poll worker
        ...
      8961ca44
    • W
      selftests/net: ignore background traffic in psock_fanout · cc30c93f
      Willem de Bruijn 提交于
      The packet fanout test generates UDP traffic and reads this with
      a pair of packet sockets, testing the various fanout algorithms.
      
      Avoid non-determinism from reading unrelated background traffic.
      Fanout decisions are made before unrelated packets can be dropped with
      a filter, so that is an insufficient strategy [*]. Run the packet
      socket tests in a network namespace, similar to msg_zerocopy.
      
      It it still good practice to install a filter on a packet socket
      before accepting traffic. Because this is example code, demonstrate
      that pattern. Open the socket initially bound to no protocol, install
      a filter, and only then bind to ETH_P_IP.
      
      Another source of non-determinism is hash collisions in FANOUT_HASH.
      The hash function used to select a socket in the fanout group includes
      the pseudorandom number hashrnd, which is not visible from userspace.
      To work around this, the test tries to find a pair of UDP source ports
      that do not collide. It gives up too soon (5 times, every 32 runs) and
      output is confusing. Increase tries to 20 and revise the error msg.
      
      [*] another approach would be to add a third socket to the fanout
          group and direct all unexpected traffic here. This is possible
          only when reimplementing methods like RR or HASH alongside this
          extra catch-all bucket, using the BPF fanout method.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc30c93f
    • C
      atm: idt77252: remove redundant bit-wise or'ing of zero · 3c69f792
      Colin Ian King 提交于
      Zero is being bit-wise or'd in a calculation twice; these are redundant
      and can be removed.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c69f792
    • E
      net_sched: gen_estimator: fix broken estimators based on percpu stats · a5f7add3
      Eric Dumazet 提交于
      pfifo_fast got percpu stats lately, uncovering a bug I introduced last
      year in linux-4.10.
      
      I missed the fact that we have to clear our temporary storage
      before calling __gnet_stats_copy_basic() in the case of percpu stats.
      
      Without this fix, rate estimators (tc qd replace dev xxx root est 1sec
      4sec pfifo_fast) are utterly broken.
      
      Fixes: 1c0d32fd ("net_sched: gen_estimator: complete rewrite of rate estimators")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5f7add3
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 22170094
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2018-02-22
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) two urgent fixes for bpf_tail_call logic for x64 and arm64 JITs, from Daniel.
      
      2) cond_resched points in percpu array alloc/free paths, from Eric.
      
      3) lockdep and other minor fixes, from Yonghong, Arnd, Anders, Li.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22170094
    • S
      rds: rds_msg_zcopy should return error of null rm->data.op_mmp_znotifier · 79a5b972
      Sowmini Varadhan 提交于
      if either or both of MSG_ZEROCOPY and SOCK_ZEROCOPY have not been
      specified, the rm->data.op_mmp_znotifier allocation will be skipped.
      In this case, it is invalid ot pass down a cmsghdr with
      RDS_CMSG_ZCOPY_COOKIE, so return EINVAL from rds_msg_zcopy for this
      case.
      
      Reported-by: syzbot+f893ae7bb2f6456dfbc3@syzkaller.appspotmail.com
      Fixes: 0cebacce ("rds: zerocopy Tx support.")
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79a5b972
    • H
      r8169: simplify and improve check for dash · 9dbe7896
      Heiner Kallweit 提交于
      r8168_check_dash() returns false anyway for all chip versions not
      supporting dash. So we can simplify the check conditions.
      
      In addition change the check functions to return bool instead of int,
      because they actually return a bool value.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dbe7896
    • H
      r8169: disable WOL per default · 7edf6d31
      Heiner Kallweit 提交于
      Currently, if BIOS enables WOL in the chip, settings are inconsistent
      because the device isn't marked as wakeup-enabled (if not done
      explicitly via userspace tools). This causes issues with suspend/
      resume because mdio_bus_phy_may_suspend() checks whether device is
      wakeup-enabled. In detail MDIO bus access in phy_suspend() can fail
      because the MDIO bus is disabled.
      
      In the history of the driver we find two competing approaches:
      8f9d5138 "r8169: remember WOL preferences on driver load" prefers
      to preserve what the BIOS may have set, whilst bde135a6
      "r8169: only enable PCI wakeups when WOL is active" disabled PCI
      wakeup per default to work around a bug on one platform.
      
      Seems like nobody complained after the latter patch about non-working
      WOL, what makes me think that nobody uses WOL w/o configuring it
      explicitly.
      
      My opinion:
      Vast majority of users doesn't use WOL even if the BIOS enables it in
      the chip. And having WOL being active keeps the PHY(s) from powering
      down if being idle.
      If somebody needs WOL, he can enable it during boot, e.g. by
      configuring systemd.link/WakeOnLan.
      
      Therefore, to make WOL consistent again, disable it per default.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7edf6d31
    • A
      gianfar: simplify FCS handling and fix memory leak · d903ec77
      Andy Spencer 提交于
      Previously, buffer descriptors containing only the frame check sequence
      (FCS) were skipped and not added to the skb. However, the page reference
      count was still incremented, leading to a memory leak.
      
      Fixing this inside gfar_add_rx_frag() is difficult due to reserved
      memory handling and page reuse. Instead, move the FCS handling to
      gfar_process_frame() and trim off the FCS before passing the skb up the
      networking stack.
      Signed-off-by: NAndy Spencer <aspencer@spacex.com>
      Signed-off-by: NJim Gruen <jgruen@spacex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d903ec77
  4. 23 2月, 2018 3 次提交
    • A
      ipv6 sit: work around bogus gcc-8 -Wrestrict warning · ca79bec2
      Arnd Bergmann 提交于
      gcc-8 has a new warning that detects overlapping input and output arguments
      in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(),
      which is actually correct:
      
      net/ipv6/sit.c: In function 'sit_init_net':
      net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
      
      The problem here is that the logic detecting the memcpy() arguments finds them
      to be the same, but the conditional that tests for the input and output of
      ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant.
      
      We know that netdev_priv(t->dev) is the same as t for a tunnel device,
      and comparing "dev" directly here lets the compiler figure out as well
      that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so
      it no longer warns.
      
      This code is old, so Cc stable to make sure that we don't get the warning
      for older kernels built with new gcc.
      
      Cc: Martin Sebor <msebor@gmail.com>
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca79bec2
    • A
      macvlan: fix use-after-free in macvlan_common_newlink() · 4e14bf42
      Alexey Kodanev 提交于
      The following use-after-free was reported by KASan when running
      LTP macvtap01 test on 4.16-rc2:
      
      [10642.528443] BUG: KASAN: use-after-free in
                     macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450
      ...
      [10642.963873] Call Trace:
      [10642.994352]  dump_stack+0x5c/0x7c
      [10643.035325]  print_address_description+0x75/0x290
      [10643.092938]  kasan_report+0x28d/0x390
      [10643.137971]  ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10643.207963]  macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10643.275978]  macvtap_newlink+0x171/0x260 [macvtap]
      [10643.334532]  rtnl_newlink+0xd4f/0x1300
      ...
      [10646.256176] Allocated by task 18450:
      [10646.299964]  kasan_kmalloc+0xa6/0xd0
      [10646.343746]  kmem_cache_alloc_trace+0xf1/0x210
      [10646.397826]  macvlan_common_newlink+0x6de/0x14a0 [macvlan]
      [10646.464386]  macvtap_newlink+0x171/0x260 [macvtap]
      [10646.522728]  rtnl_newlink+0xd4f/0x1300
      ...
      [10647.022028] Freed by task 18450:
      [10647.061549]  __kasan_slab_free+0x138/0x180
      [10647.111468]  kfree+0x9e/0x1c0
      [10647.147869]  macvlan_port_destroy+0x3db/0x650 [macvlan]
      [10647.211411]  rollback_registered_many+0x5b9/0xb10
      [10647.268715]  rollback_registered+0xd9/0x190
      [10647.319675]  register_netdevice+0x8eb/0xc70
      [10647.370635]  macvlan_common_newlink+0xe58/0x14a0 [macvlan]
      [10647.437195]  macvtap_newlink+0x171/0x260 [macvtap]
      
      Commit d02fd6e7 ("macvlan: Fix one possible double free") handles
      the case when register_netdevice() invokes ndo_uninit() on error and
      as a result free the port. But 'macvlan_port_get_rtnl(dev))' check
      (returns dev->rx_handler_data), which was added by this commit in order
      to prevent double free, is not quite correct:
      
      * for macvlan it always returns NULL because 'lowerdev' is the one that
        was used to register rx handler (port) in macvlan_port_create() as
        well as to unregister it in macvlan_port_destroy().
      * for macvtap it always returns a valid pointer because macvtap registers
        its own rx handler before macvlan_common_newlink().
      
      Fixes: d02fd6e7 ("macvlan: Fix one possible double free")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e14bf42
    • Y
      bpf: NULL pointer check is not needed in BPF_CGROUP_RUN_PROG_INET_SOCK · ee07862f
      Yafang Shao 提交于
      sk is already allocated in inet_create/inet6_create, hence when
      BPF_CGROUP_RUN_PROG_INET_SOCK is executed sk will never be NULL.
      
      The logic is as bellow,
      	sk = sk_alloc();
      	if (!sk)
      		goto out;
      	BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ee07862f