1. 01 7月, 2017 2 次提交
  2. 30 6月, 2017 1 次提交
  3. 29 6月, 2017 1 次提交
    • J
      block: provide bio_uninit() free freeing integrity/task associations · 9ae3b3f5
      Jens Axboe 提交于
      Wen reports significant memory leaks with DIF and O_DIRECT:
      
      "With nvme devive + T10 enabled, On a system it has 256GB and started
      logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
      it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
      leaking.
      
      /proc/meminfo | grep SUnreclaim...
      
      SUnreclaim:      6752128 kB
      SUnreclaim:      6874880 kB
      SUnreclaim:      7238080 kB
      ....
      SUnreclaim:     22307264 kB
      SUnreclaim:     22485888 kB
      SUnreclaim:     22720256 kB
      
      When testcases with T10 enabled call into __blkdev_direct_IO_simple,
      code doesn't free memory allocated by bio_integrity_alloc. The patch
      fixes the issue. HTX has been run with +60 hours without failure."
      
      Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
      doesn't go through the regular bio free. This means that any ancillary
      data allocated with the bio through the stack is not freed. Hence, we
      can leak the integrity data associated with the bio, if the device is
      using DIF/DIX.
      
      Fix this by providing a bio_uninit() and export it, so that we can use
      it to free this data. Note that this is a minimal fix for this issue.
      Any current user of bio's that are allocated outside of
      bio_alloc_bioset() suffers from this issue, most notably some drivers.
      We will fix those in a more comprehensive patch for 4.13. This also
      means that the commit marked as being fixed by this isn't the real
      culprit, it's just the most obvious one out there.
      
      Fixes: 542ff7bf ("block: new direct I/O implementation")
      Reported-by: NWen Xiong <wenxiong@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9ae3b3f5
  4. 27 6月, 2017 7 次提交
    • I
      net/mlx5e: IPSec, Innova IPSec offload infrastructure · 547eede0
      Ilan Tayari 提交于
      Add Innova IPSec ESP crypto offload configuration paths.
      Detect Innova IPSec device and set the NETIF_F_HW_ESP flag.
      Configure Security Associations using the API introduced in a previous
      patch.
      
      Add Software-parser hardware descriptor layout
      Software-Parser (swp) is a hardware feature in ConnectX which allows the
      host software to specify protocol header offsets in the TX path, thus
      overriding the hardware parser.
      This is useful for protocols that the ASIC may not be able to parse on
      its own.
      
      Note that due to inline metadata, XDP is not supported in Innova IPSec.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: NYevgeny Kliteynik <kliteyn@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      547eede0
    • I
      net/mlx5: Accel, Add IPSec acceleration interface · bebb23e6
      Ilan Tayari 提交于
      Add routines for manipulating the hardware IPSec SA database (SADB).
      
      In Innova IPSec, a Security Association (SA) is added or deleted
      via a command message over the SBU connection.
      The HW then sends a response message over the same connection.
      
      Add implementation for Innova IPSec (FPGA-based) hardware.
      
      These routines will be used by the IPSec offload support in a later patch
      However they may also be used by others such as RDMA and RoCE IPSec.
      
      mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
      to work directly with mlx5_core rather than Innova FPGA or other mlx5
      acceleration providers.
      
      In this patchset we add Innova IPSec support and mlx5/accel delegates
      IPSec offloads to Innova routines.
      
      In the future, when IPSec/TLS or any other acceleration gets integrated
      into ConnectX chip, mlx5/accel layer will provide the integrated
      acceleration, rather than the Innova one.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      bebb23e6
    • I
      net/mlx5: FPGA, Add SBU infrastructure · a9956d35
      Ilan Tayari 提交于
      Add interface to initialize and interact with Innova FPGA SBU
      connections.
      A client driver may use these functions to set up a high-speed DMA
      connection with its SBU hardware logic, and send/receive messages
      over this connection.
      
      A later patch in this patchset will make use of these functions for
      Innova IPSec offload in mlx5 Ethernet driver.
      
      Add commands to retrieve Innova FPGA SBU capabilities, and to
      read/write Innova FPGA configuration space registers and memory,
      over internal I2C.
      
      At high level, the FPGA configuration space is divided such:
       0x00000000 - 0x007fffff is reserved for the SBU
       0x00800000 - 0xffffffff is reserved for the Shell
      0x400000000 - ...        is DDR memory
      
      A later patchset will add support for accessing FPGA CrSpace and memory
      over a high-speed connection. This is the reason for the ACCESS_TYPE
      enumeration, which currently only supports I2C.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a9956d35
    • I
      net/mlx5: FPGA, Add SBU bypass and reset flows · c43051d7
      Ilan Tayari 提交于
      The Innova FPGA includes shell hardware and Sandbox-Unit (SBU) hardware.
      The shell hardware is handled by mlx5_core itself, while the SBU is
      handled by a client driver.
      
      Reset the SBU to a well-known initial state when initializing a new
      device, and set the FPGA to bypass mode when uninitializing a device.
      This allows the client driver to assume that its device has been
      reset when a new device is detected.
      
      During SBU reset, the FPGA is put into SBU-bypass mode. In this mode
      packets do not pass through the SBU, so it cannot affect the network
      data stream at all.
      
      A factory-image does not have an SBU, so skip these flows.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      c43051d7
    • I
      net/mlx5: FPGA, Add FW commands for FPGA QPs · 6062118d
      Ilan Tayari 提交于
      The FPGA QP is a high-bandwidth communication channel between the host
      CPU and the FPGA device. It allows performing DMA operations between
      host memory and the FPGA logic via the ConnectX chip.
      
      Add ConnectX FW commands which create and manipulate FPGA QPs.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      6062118d
    • I
      net/mlx5: Add support for multiple RoCE enable · a6f7d2af
      Ilan Tayari 提交于
      Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as
      well.
      Add support for counting number of enables, so that FPGA and IB can work
      in parallel and independently.
      Program the HW to enable RoCE on the first enable call, and program to
      disable RoCE on the last disable call.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a6f7d2af
    • I
      net/mlx5: Add reserved-gids support · 52ec462e
      Ilan Tayari 提交于
      Reserved GIDs are entries in the GID table in use by the mlx5_core
      and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev).
      The entries are reserved at the high indexes of the GID table.
      
      A mlx5 submodule may reserve a certain amount of GIDs for its own use
      during the load sequence by calling mlx5_core_reserve_gids, and must
      also take care to un-reserve these GIDs when it closes.
      Reservation is only allowed during the load sequence and before any
      interfaces (e.g. mlx5_ib or mlx5_en) are up.
      
      After reservation, a submodule may call mlx5_core_reserved_gid_alloc/
      free to allocate entries from the reserved GIDs pool.
      
      Reserve a GID table entry for every supported FPGA QP.
      
      A later patch in the patchset will remove them from being reported to
      IB core.
      Another such patch will make use of these for FPGA QPs in Innova NIC.
      
      Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to
      expose only public mlx5 API, more mlx5 library files will be added in
      future submissions.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      52ec462e
  5. 25 6月, 2017 1 次提交
  6. 24 6月, 2017 6 次提交
    • T
      slub: make sysfs file removal asynchronous · 3b7b3140
      Tejun Heo 提交于
      Commit bf5eb3de ("slub: separate out sysfs_slab_release() from
      sysfs_slab_remove()") made slub sysfs file removals synchronous to
      kmem_cache shutdown.
      
      Unfortunately, this created a possible ABBA deadlock between slab_mutex
      and sysfs draining mechanism triggering the following lockdep warning.
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        4.10.0-test+ #48 Not tainted
        -------------------------------------------------------
        rmmod/1211 is trying to acquire lock:
         (s_active#120){++++.+}, at: [<ffffffff81308073>] kernfs_remove+0x23/0x40
      
        but task is already holding lock:
         (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (slab_mutex){+.+.+.}:
      	 lock_acquire+0xf6/0x1f0
      	 __mutex_lock+0x75/0x950
      	 mutex_lock_nested+0x1b/0x20
      	 slab_attr_store+0x75/0xd0
      	 sysfs_kf_write+0x45/0x60
      	 kernfs_fop_write+0x13c/0x1c0
      	 __vfs_write+0x28/0x120
      	 vfs_write+0xc8/0x1e0
      	 SyS_write+0x49/0xa0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        -> #0 (s_active#120){++++.+}:
      	 __lock_acquire+0x10ed/0x1260
      	 lock_acquire+0xf6/0x1f0
      	 __kernfs_remove+0x254/0x320
      	 kernfs_remove+0x23/0x40
      	 sysfs_remove_dir+0x51/0x80
      	 kobject_del+0x18/0x50
      	 __kmem_cache_shutdown+0x3e6/0x460
      	 kmem_cache_destroy+0x1fb/0x2d0
      	 kvm_exit+0x2d/0x80 [kvm]
      	 vmx_exit+0x19/0xa1b [kvm_intel]
      	 SyS_delete_module+0x198/0x1f0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(slab_mutex);
      				 lock(s_active#120);
      				 lock(slab_mutex);
          lock(s_active#120);
      
         *** DEADLOCK ***
      
        2 locks held by rmmod/1211:
         #0:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff810a7877>] get_online_cpus+0x37/0x80
         #1:  (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        stack backtrace:
        CPU: 3 PID: 1211 Comm: rmmod Not tainted 4.10.0-test+ #48
        Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
        Call Trace:
         print_circular_bug+0x1be/0x210
         __lock_acquire+0x10ed/0x1260
         lock_acquire+0xf6/0x1f0
         __kernfs_remove+0x254/0x320
         kernfs_remove+0x23/0x40
         sysfs_remove_dir+0x51/0x80
         kobject_del+0x18/0x50
         __kmem_cache_shutdown+0x3e6/0x460
         kmem_cache_destroy+0x1fb/0x2d0
         kvm_exit+0x2d/0x80 [kvm]
         vmx_exit+0x19/0xa1b [kvm_intel]
         SyS_delete_module+0x198/0x1f0
         ? SyS_delete_module+0x5/0x1f0
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It'd be the cleanest to deal with the issue by removing sysfs files
      without holding slab_mutex before the rest of shutdown; however, given
      the current code structure, it is pretty difficult to do so.
      
      This patch punts sysfs file removal to a work item.  Before commit
      bf5eb3de, the removal was punted to a RCU delayed work item which is
      executed after release.  Now, we're punting to a different work item on
      shutdown which still maintains the goal removing the sysfs files earlier
      when destroying kmem_caches.
      
      Link: http://lkml.kernel.org/r/20170620204512.GI21326@htj.duckdns.org
      Fixes: bf5eb3de ("slub: separate out sysfs_slab_release() from sysfs_slab_remove()")
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7b3140
    • F
      net: phy: Support "internal" PHY interface · 735d8a18
      Florian Fainelli 提交于
      Now that the Device Tree binding has been updated, update the PHY
      library phy_interface_t and phy_modes to support the "internal" PHY
      interface type.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      735d8a18
    • Y
      bpf: possibly avoid extra masking for narrower load in verifier · 23994631
      Yonghong Song 提交于
      Commit 31fd8581 ("bpf: permits narrower load from bpf program
      context fields") permits narrower load for certain ctx fields.
      The commit however will already generate a masking even if
      the prog-specific ctx conversion produces the result with
      narrower size.
      
      For example, for __sk_buff->protocol, the ctx conversion
      loads the data into register with 2-byte load.
      A narrower 2-byte load should not generate masking.
      For __sk_buff->vlan_present, the conversion function
      set the result as either 0 or 1, essentially a byte.
      The narrower 2-byte or 1-byte load should not generate masking.
      
      To avoid unnecessary masking, prog-specific *_is_valid_access
      now passes converted_op_size back to verifier, which indicates
      the valid data width after perceived future conversion.
      Based on this information, verifier is able to avoid
      unnecessary marking.
      
      Since we want more information back from prog-specific
      *_is_valid_access checking, all of them are packed into
      one data structure for more clarity.
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23994631
    • J
      xdp: add reporting of offload mode · ce158e58
      Jakub Kicinski 提交于
      Extend the XDP_ATTACHED_* values to include offloaded mode.
      Let drivers report whether program is installed in the driver
      or the HW by changing the prog_attached field from bool to
      u8 (type of the netlink attribute).
      
      Exploit the fact that the value of XDP_ATTACHED_DRV is 1,
      therefore since all drivers currently assign the mode with
      double negation:
             mode = !!xdp_prog;
      no drivers have to be modified.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce158e58
    • J
      xdp: add HW offload mode flag for installing programs · ee5d032f
      Jakub Kicinski 提交于
      Add an installation-time flag for requesting that the program
      be installed only if it can be offloaded to HW.
      
      Internally new command for ndo_xdp is added, this way we avoid
      putting checks into drivers since they all return -EINVAL on
      an unknown command.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee5d032f
    • J
      xdp: pass XDP flags into install handlers · 32d60277
      Jakub Kicinski 提交于
      Pass XDP flags to the xdp ndo.  This will allow drivers to look
      at the mode flags and make decisions about offload.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32d60277
  7. 22 6月, 2017 7 次提交
  8. 21 6月, 2017 5 次提交
  9. 20 6月, 2017 4 次提交
    • J
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · 3d88d56c
      John Stultz 提交于
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3d88d56c
    • J
      time: Fix clock->read(clock) race around clocksource changes · ceea5e37
      John Stultz 提交于
      In tests, which excercise switching of clocksources, a NULL
      pointer dereference can be observed on AMR64 platforms in the
      clocksource read() function:
      
      u64 clocksource_mmio_readl_down(struct clocksource *c)
      {
      	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
      }
      
      This is called from the core timekeeping code via:
      
      	cycle_now = tkr->read(tkr->clock);
      
      tkr->read is the cached tkr->clock->read() function pointer.
      When the clocksource is changed then tkr->clock and tkr->read
      are updated sequentially. The code above results in a sequential
      load operation of tkr->read and tkr->clock as well.
      
      If the store to tkr->clock hits between the loads of tkr->read
      and tkr->clock, then the old read() function is called with the
      new clock pointer. As a consequence the read() function
      dereferences a different data structure and the resulting 'reg'
      pointer can point anywhere including NULL.
      
      This problem was introduced when the timekeeping code was
      switched over to use struct tk_read_base. Before that, it was
      theoretically possible as well when the compiler decided to
      reload clock in the code sequence:
      
           now = tk->clock->read(tk->clock);
      
      Add a helper function which avoids the issue by reading
      tk_read_base->clock once into a local variable clk and then issue
      the read function via clk->read(clk). This guarantees that the
      read() function always gets the proper clocksource pointer handed
      in.
      
      Since there is now no use for the tkr.read pointer, this patch
      also removes it, and to address stopping the fast timekeeper
      during suspend/resume, it introduces a dummy clocksource to use
      rather then just a dummy read function.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ceea5e37
    • P
      netfilter: nfnetlink: extended ACK reporting · 04ba724b
      Pablo Neira Ayuso 提交于
      Pass down struct netlink_ext_ack as parameter to all of our nfnetlink
      subsystem callbacks, so we can work on follow up patches to provide
      finer grain error reporting using the new infrastructure that
      2d4bc933 ("netlink: extended ACK reporting") provides.
      
      No functional change, just pass down this new object to callbacks.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      04ba724b
    • G
      netfilter: ebt: Use new helper ebt_invalid_target to check target · e15b9c50
      Gao Feng 提交于
      Use the new helper function ebt_invalid_target instead of the old
      macro INVALID_TARGET and other duplicated codes to enhance the readability.
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e15b9c50
  10. 19 6月, 2017 1 次提交
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  11. 16 6月, 2017 5 次提交
    • M
      net: Add IFLA_XDP_PROG_ID · 58038695
      Martin KaFai Lau 提交于
      Expose prog_id through IFLA_XDP_PROG_ID.  This patch
      makes modification to generic_xdp.  The later patches will
      modify other xdp-supported drivers.
      
      prog_id is added to struct net_dev_xdp.
      
      iproute2 patch will be followed. Here is how the 'ip link'
      will look like:
      > ip link show eth0
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58038695
    • J
      networking: add and use skb_put_u8() · 634fef61
      Johannes Berg 提交于
      Joe and Bjørn suggested that it'd be nicer to not have the
      cast in the fairly common case of doing
      	*(u8 *)skb_put(skb, 1) = c;
      
      Add skb_put_u8() for this case, and use it across the code,
      using the following spatch:
      
          @@
          expression SKB, C, S;
          typedef u8;
          identifier fn = {skb_put};
          fresh identifier fn2 = fn ## "_u8";
          @@
          - *(u8 *)fn(SKB, S) = C;
          + fn2(SKB, C);
      
      Note that due to the "S", the spatch isn't perfect, it should
      have checked that S is 1, but there's also places that use a
      sizeof expression like sizeof(var) or sizeof(u8) etc. Turns
      out that nobody ever did something like
      	*(u8 *)skb_put(skb, 2) = c;
      
      which would be wrong anyway since the second byte wouldn't be
      initialized.
      Suggested-by: NJoe Perches <joe@perches.com>
      Suggested-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634fef61
    • J
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d58ff351
    • J
      networking: make skb_pull & friends return void pointers · af72868b
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = {
                  skb_pull,
                  __skb_pull,
                  skb_pull_inline,
                  __pskb_pull_tail,
                  __pskb_pull,
                  pskb_pull
          };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = {
                  skb_pull,
                  __skb_pull,
                  skb_pull_inline,
                  __pskb_pull_tail,
                  __pskb_pull,
                  pskb_pull
          };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af72868b
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1