1. 12 3月, 2018 2 次提交
  2. 08 3月, 2018 2 次提交
    • D
      of: unittest: fix an error test in of_unittest_overlay_8() · bdb7013d
      Dan Carpenter 提交于
      We changed this from of_overlay_apply() to overlay_data_apply().  The
      overlay_data_apply() function returns 1 on success and 0 on error so
      the check for less than zero needs to be updated.
      
      Fixes: 39a751a4 ("of: change overlay apply input data from unflattened to FDT")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NFrank Rowand <frowand.list@gmail.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      bdb7013d
    • F
      of: cache phandle nodes to reduce cost of of_find_node_by_phandle() · 0b3ce78e
      Frank Rowand 提交于
      Create a cache of the nodes that contain a phandle property.  Use this
      cache to find the node for a given phandle value instead of scanning
      the devicetree to find the node.  If the phandle value is not found
      in the cache, of_find_node_by_phandle() will fall back to the tree
      scan algorithm.
      
      The cache is initialized in of_core_init().
      
      The cache is freed via a late_initcall_sync() if modules are not
      enabled.
      
      If the devicetree is created by the dtc compiler, with all phandle
      property values auto generated, then the size required by the cache
      could be 4 * (1 + number of phandles) bytes.  This results in an O(1)
      node lookup cost for a given phandle value.  Due to a concern that the
      phandle property values might not be consistent with what is generated
      by the dtc compiler, a mask has been added to the cache lookup algorithm.
      To maintain the O(1) node lookup cost, the size of the cache has been
      increased by rounding the number of entries up to the next power of
      two.
      
      The overhead of finding the devicetree node containing a given phandle
      value has been noted by several people in the recent past, in some cases
      with a patch to add a hashed index of devicetree nodes, based on the
      phandle value of the node.  One concern with this approach is the extra
      space added to each node.  This patch takes advantage of the phandle
      property values auto generated by the dtc compiler, which begin with
      one and monotonically increase by one, resulting in a range of 1..n
      for n phandle values.  This implementation should also provide a good
      reduction of overhead for any range of phandle values that are mostly
      in a monotonic range.
      
      Performance measurements by Chintan Pandya <cpandya@codeaurora.org>
      of several implementations of patches that are similar to this one
      suggest an expected reduction of boot time by ~400ms for his test
      system.  If the cache size was decreased to 64 entries, the boot
      time was reduced by ~340 ms.  The measurements were on a 4.9.73 kernel
      for arch/arm64/boot/dts/qcom/sda670-mtp.dts, contains 2371 nodes and
      814 phandle values.
      Reported-by: NChintan Pandya <cpandya@codeaurora.org>
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      0b3ce78e
  3. 06 3月, 2018 2 次提交
    • F
      of: overlay: do not include path in full_name of added nodes · b89dae18
      Frank Rowand 提交于
      Struct device_node full_name no longer includes the full path name
      when the devicetree is created from a flattened device tree (FDT).
      The overlay node creation code was not modified to reflect this
      change.  Fix the node full_name generated by overlay code to contain
      only the basename.
      
      Unittests call an overlay internal function to create new nodes.
      Fix up these calls to provide basename only instead of the full
      path.
      
      Fixes: a7e4cfb0 ("of/fdt: only store the device node basename
      in full_name")
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      b89dae18
    • F
      of: unittest: clean up changeset test · a4f91f0d
      Frank Rowand 提交于
      In preparation for fixing __of_node_dup(), clean up the unittest
      function that calls it.
      
      Devicetree nodes created from a flattened device tree have a name
      property.  Follow this convention for nodes added by a changeset.
      
      For node added by changeset, remove incorrect initialization of
      child node pointer.
      
      Add an additional node pointer 'changeset' to more naturally reflect
      where in the tree the changeset is added.
      
      Make changeset add property error messages unique.
      
      Add whitespace to break apart logic blocks.
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      a4f91f0d
  4. 04 3月, 2018 3 次提交
    • F
      of: improve reporting invalid overlay target path · e547c003
      Frank Rowand 提交于
      Errors while developing the patch to create of_overlay_fdt_apply()
      exposed inadequate error messages to debug problems when overlay
      devicetree fragment nodes contain an invalid target path.  Improve
      the messages in find_target_node() to remedy this.
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      e547c003
    • F
      of: convert unittest overlay devicetree source to sugar syntax · db2f3762
      Frank Rowand 提交于
      The unittest-data overlays have been pulled into proper overlay
      devicetree source files without changing their format.  The
      next step is to convert them to use sugar syntax instead of
      hand coding overlay fragments structure.
      
      A few of the overlays can not be converted because they test
      absolute target paths in the overlay fragment.  dtc does not
      generate this type of target:
        overlay_0.dts
        overlay_1.dts
        overlay_12.dts
        overlay_13.dts
      
      Two pre-existing unittest overlay devicetree source files are
      also converted:
        overlay_bad_phandle.dts
        overlay_bad_symbol.dts
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      db2f3762
    • F
      of: change overlay apply input data from unflattened to FDT · 39a751a4
      Frank Rowand 提交于
      Move duplicating and unflattening of an overlay flattened devicetree
      (FDT) into the overlay application code.  To accomplish this,
      of_overlay_apply() is replaced by of_overlay_fdt_apply().
      
      The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
      code, which is thus responsible for freeing the duplicate FDT.  The
      caller of of_overlay_fdt_apply() remains responsible for freeing the
      original FDT.
      
      The unflattened devicetree now belongs to devicetree code, which is
      thus responsible for freeing the unflattened devicetree.
      
      These ownership changes prevent early freeing of the duplicated FDT
      or the unflattened devicetree, which could result in use after free
      errors.
      
      of_overlay_fdt_apply() is a private function for the anticipated
      overlay loader.
      
      Update unittest.c to use of_overlay_fdt_apply() instead of
      of_overlay_apply().
      
      Move overlay fragments from artificial locations in
      drivers/of/unittest-data/tests-overlay.dtsi into one devicetree
      source file per overlay.  This led to changes in
      drivers/of/unitest-data/Makefile and drivers/of/unitest.c.
      
        - Add overlay directives to the overlay devicetree source files so
          that dtc will compile them as true overlays into one FDT data
          chunk per overlay.
      
        - Set CFLAGS for drivers/of/unittest-data/testcases.dts so that
          symbols will be generated for overlay resolution of overlays
          that are no longer artificially contained in testcases.dts
      
        - Unflatten and apply each unittest overlay FDT using
          of_overlay_fdt_apply().
      
        - Enable the of_resolve_phandles() check for whether the unflattened
          overlay is detached.  This check was previously disabled because the
          overlays from tests-overlay.dtsi were not unflattened into detached
          trees.
      
        - Other changes to unittest.c infrastructure to manage multiple test
          FDTs built into the kernel image (access by name instead of
          arbitrary number).
      
        - of_unittest_overlay_high_level(): previously unused code to add
          properties from the overlay_base devicetree to the live tree
          was triggered by the restructuring of tests-overlay.dtsi and thus
          testcases.dts.  This exposed two bugs: (1) the need to dup a
          property before adding it, and (2) property 'name' is
          auto-generated in the unflatten code and thus will be a duplicate
          in the __symbols__ node - do not treat this duplicate as an error.
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      39a751a4
  5. 24 2月, 2018 1 次提交
  6. 23 2月, 2018 5 次提交
    • A
      macvlan: fix use-after-free in macvlan_common_newlink() · 4e14bf42
      Alexey Kodanev 提交于
      The following use-after-free was reported by KASan when running
      LTP macvtap01 test on 4.16-rc2:
      
      [10642.528443] BUG: KASAN: use-after-free in
                     macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450
      ...
      [10642.963873] Call Trace:
      [10642.994352]  dump_stack+0x5c/0x7c
      [10643.035325]  print_address_description+0x75/0x290
      [10643.092938]  kasan_report+0x28d/0x390
      [10643.137971]  ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10643.207963]  macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
      [10643.275978]  macvtap_newlink+0x171/0x260 [macvtap]
      [10643.334532]  rtnl_newlink+0xd4f/0x1300
      ...
      [10646.256176] Allocated by task 18450:
      [10646.299964]  kasan_kmalloc+0xa6/0xd0
      [10646.343746]  kmem_cache_alloc_trace+0xf1/0x210
      [10646.397826]  macvlan_common_newlink+0x6de/0x14a0 [macvlan]
      [10646.464386]  macvtap_newlink+0x171/0x260 [macvtap]
      [10646.522728]  rtnl_newlink+0xd4f/0x1300
      ...
      [10647.022028] Freed by task 18450:
      [10647.061549]  __kasan_slab_free+0x138/0x180
      [10647.111468]  kfree+0x9e/0x1c0
      [10647.147869]  macvlan_port_destroy+0x3db/0x650 [macvlan]
      [10647.211411]  rollback_registered_many+0x5b9/0xb10
      [10647.268715]  rollback_registered+0xd9/0x190
      [10647.319675]  register_netdevice+0x8eb/0xc70
      [10647.370635]  macvlan_common_newlink+0xe58/0x14a0 [macvlan]
      [10647.437195]  macvtap_newlink+0x171/0x260 [macvtap]
      
      Commit d02fd6e7 ("macvlan: Fix one possible double free") handles
      the case when register_netdevice() invokes ndo_uninit() on error and
      as a result free the port. But 'macvlan_port_get_rtnl(dev))' check
      (returns dev->rx_handler_data), which was added by this commit in order
      to prevent double free, is not quite correct:
      
      * for macvlan it always returns NULL because 'lowerdev' is the one that
        was used to register rx handler (port) in macvlan_port_create() as
        well as to unregister it in macvlan_port_destroy().
      * for macvtap it always returns a valid pointer because macvtap registers
        its own rx handler before macvlan_common_newlink().
      
      Fixes: d02fd6e7 ("macvlan: Fix one possible double free")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e14bf42
    • D
      net: aquantia: Fix error handling in aq_pci_probe() · 370c1052
      Dan Carpenter 提交于
      We should check "self->aq_hw" for allocation failure, and also we should
      free it on the error paths.
      
      Fixes: 23ee07ad ("net: aquantia: Cleanup pci functions module")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      370c1052
    • T
      ibmvnic: Fix early release of login buffer · a2c0f039
      Thomas Falcon 提交于
      The login buffer is released before the driver can perform
      sanity checks between resources the driver requested and what
      firmware will provide. Don't release the login buffer until
      the sanity check is performed.
      
      Fixes: 34f0f4e3 ("ibmvnic: Fix login buffer memory leaks")
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2c0f039
    • F
      net/smc9194: Remove bogus CONFIG_MAC reference · 83090e7d
      Finn Thain 提交于
      AFAIK the only version of smc9194.c with Mac support is the one in the
      linux-mac68k CVS repo, which never made it to the mainline.
      
      Despite that, from v2.3.45, arch/m68k/config.in listed CONFIG_SMC9194
      under CONFIG_MAC. This mistake got carried over into Kconfig in v2.5.55.
      (See pre-git era "[PATCH] add m68k dependencies to net driver config".)
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83090e7d
    • E
      smsc75xx: fix smsc75xx_set_features() · 88e80c62
      Eric Dumazet 提交于
      If an attempt is made to disable RX checksums, USB adapter is changed
      but netdev->features is not, because smsc75xx_set_features() returns a
      non zero value.
      
      This throws errors from netdev_rx_csum_fault() :
      <devname>: hw csum failure
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steve Glendinning <steve.glendinning@shawell.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88e80c62
  7. 22 2月, 2018 11 次提交
    • A
      i2c: designware: Consider SCL GPIO optional · d1fa7452
      Andy Shevchenko 提交于
      GPIO library can return -ENOSYS for the failed request.
      Instead of failing ->probe() in this case override error code to 0.
      
      Fixes: ca382f5b ("i2c: designware: add i2c gpio recovery option")
      Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Tested-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      d1fa7452
    • P
      i2c: busses: i2c-sirf: Fix spelling: "formular" -> "formula". · c396b9a0
      Patryk Kocielnik 提交于
      Fix spelling.
      Signed-off-by: NPatryk Kocielnik <patryk.kocielnik@gmail.com>
      [wsa: fixed "Initialization", too]
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      c396b9a0
    • E
      i2c: bcm2835: Set up the rising/falling edge delays · fe32a815
      Eric Anholt 提交于
      We were leaving them in the power on state (or the state the firmware
      had set up for some client, if we were taking over from them).  The
      boot state was 30 core clocks, when we actually want to sample some
      time after (to make sure that the new input bit has actually arrived).
      Signed-off-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      fe32a815
    • I
      treewide/trivial: Remove ';;$' typo noise · ed7158ba
      Ingo Molnar 提交于
      On lkml suggestions were made to split up such trivial typo fixes into per subsystem
      patches:
      
        --- a/arch/x86/boot/compressed/eboot.c
        +++ b/arch/x86/boot/compressed/eboot.c
        @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
                struct efi_uga_draw_protocol *uga = NULL, *first_uga;
                efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
                unsigned long nr_ugas;
        -       u32 *handles = (u32 *)uga_handle;;
        +       u32 *handles = (u32 *)uga_handle;
                efi_status_t status = EFI_INVALID_PARAMETER;
                int i;
      
      This patch is the result of the following script:
      
        $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
      
      ... followed by manual review to make sure it's all good.
      
      Splitting this up is just crazy talk, let's get over with this and just do it.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed7158ba
    • H
      mm, swap, frontswap: fix THP swap if frontswap enabled · 7ba71669
      Huang Ying 提交于
      It was reported by Sergey Senozhatsky that if THP (Transparent Huge
      Page) and frontswap (via zswap) are both enabled, when memory goes low
      so that swap is triggered, segfault and memory corruption will occur in
      random user space applications as follow,
      
      kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
       #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
       #1  0x00007fc08889c2f3 malloc (libc.so.6)
       #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
       #3  0x0000560e6005e75c n/a (urxvt)
       #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
       #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
       #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
       #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
       #8  0x0000560e6005cb55 ev_run (urxvt)
       #9  0x0000560e6003b9b9 main (urxvt)
       #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
       #11 0x0000560e6003f9da _start (urxvt)
      
      After bisection, it was found the first bad commit is bd4c82c2 ("mm,
      THP, swap: delay splitting THP after swapped out").
      
      The root cause is as follows:
      
      When the pages are written to swap device during swapping out in
      swap_writepage(), zswap (fontswap) is tried to compress the pages to
      improve performance.  But zswap (frontswap) will treat THP as a normal
      page, so only the head page is saved.  After swapping in, tail pages
      will not be restored to their original contents, causing memory
      corruption in the applications.
      
      This is fixed by refusing to save page in the frontswap store functions
      if the page is a THP.  So that the THP will be swapped out to swap
      device.
      
      Another choice is to split THP if frontswap is enabled.  But it is found
      that the frontswap enabling isn't flexible.  For example, if
      CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
      zswap itself isn't enabled.
      
      Frontswap has multiple backends, to make it easy for one backend to
      enable THP support, the THP checking is put in backend frontswap store
      functions instead of the general interfaces.
      
      Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
      Fixes: bd4c82c2 ("mm, THP, swap: delay splitting THP after swapped out")
      Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: Minchan Kim <minchan@kernel.org>	[put THP checking in backend]
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: <stable@vger.kernel.org>	[4.14]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ba71669
    • T
      amd-xgbe: Restore PCI interrupt enablement setting on resume · cfd092f2
      Tom Lendacky 提交于
      After resuming from suspend, the PCI device support must re-enable the
      interrupt setting so that interrupts are actually delivered.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfd092f2
    • J
      virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP · 8dcc5b0a
      Jesper Dangaard Brouer 提交于
      When a driver implements the ndo_xdp_xmit() function, there is
      (currently) no generic way to determine whether it is safe to call.
      
      It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not
      allocated the needed XDP TX queues yet.  This is the case for
      virtio_net, which first allocates the XDP TX queues once an XDP/bpf
      prog is attached (in virtnet_xdp_set()).
      
      Thus, a crash will occur for virtio_net when redirecting to another
      virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog.
      The sample xdp_redirect_map tries to attach a dummy XDP prog to take
      this into account, but it can also easily fail if the virtio_net (or
      actually underlying vhost driver) have not allocated enough extra
      queues for the device.
      
      Allocating more queue this is currently a manual config.
      Hint for libvirt XML add:
      
        <driver name='vhost' queues='16'>
          <host mrg_rxbuf='off'/>
          <guest tso4='off' tso6='off' ecn='off' ufo='off'/>
        </driver>
      
      The solution in this patch is to check that the device have loaded an
      XDP/bpf prog before proceeding.  This is similar to the check
      performed in driver ixgbe.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dcc5b0a
    • J
      virtio_net: fix memory leak in XDP_REDIRECT · 11b7d897
      Jesper Dangaard Brouer 提交于
      XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons
      (which can be inspected by tracepoints). The current semantics is that
      on failure the driver calling xdp_do_redirect() must handle freeing or
      recycling the page associated with this frame.  This can be seen as an
      optimization, as drivers usually have an optimized XDP_DROP code path
      for frame recycling in place already.
      
      The virtio_net driver didn't handle when xdp_do_redirect() failed.
      This caused a memory leak as the page refcnt wasn't decremented on
      failures.
      
      The function __virtnet_xdp_xmit() did handle one type of failure,
      when the xmit queue virtqueue_add_outbuf() is full, which "hides"
      releasing a refcnt on the page.  Instead the function __virtnet_xdp_xmit()
      must follow API of xdp_do_redirect(), which on errors leave it up to
      the caller to free the page, of the failed send operation.
      
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11b7d897
    • J
      virtio_net: fix XDP code path in receive_small() · 95dbe9e7
      Jesper Dangaard Brouer 提交于
      When configuring virtio_net to use the code path 'receive_small()',
      in-order to get correct XDP_REDIRECT support, I discovered TCP packets
      would get silently dropped when loading an XDP program action XDP_PASS.
      
      The bug seems to be that receive_small() when XDP is loaded check that
      hdr->hdr.flags is zero, which seems wrong as hdr.flags contains the
      flags VIRTIO_NET_HDR_F_* :
       #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */
       #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */
      
      TCP got dropped as it had the VIRTIO_NET_HDR_F_DATA_VALID flag set.
      
      The flags that are relevant here are the VIRTIO_NET_HDR_GSO_* flags
      stored in hdr->hdr.gso_type. Thus, the fix is just check that none of
      the gso_type flags have been set.
      
      Fixes: bb91accf ("virtio-net: XDP support for small buffers")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95dbe9e7
    • J
      virtio_net: disable XDP_REDIRECT in receive_mergeable() case · 7324f539
      Jesper Dangaard Brouer 提交于
      The virtio_net code have three different RX code-paths in receive_buf().
      Two of these code paths can handle XDP, but one of them is broken for
      at least XDP_REDIRECT.
      
      Function(1): receive_big() does not support XDP.
      Function(2): receive_small() support XDP fully and uses build_skb().
      Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb().
      
      The simple explanation is that receive_mergeable() is broken because
      it uses napi_alloc_skb(), which violates XDP given XDP assumes packet
      header+data in single page and enough tail room for skb_shared_info.
      
      The longer explaination is that receive_mergeable() tries to
      work-around and satisfy these XDP requiresments e.g. by having a
      function xdp_linearize_page() that allocates and memcpy RX buffers
      around (in case packet is scattered across multiple rx buffers).  This
      does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
      we have not implemented bpf_xdp_adjust_tail yet).
      
      The XDP_REDIRECT action combined with cpumap is broken, and cause hard
      to debug crashes.  The main issue is that the RX packet does not have
      the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing
      skb_shared_info to overlap the next packets head-room (in which cpumap
      stores info).
      
      Reproducing depend on the packet payload length and if RX-buffer size
      happened to have tail-room for skb_shared_info or not.  But to make
      this even harder to troubleshoot, the RX-buffer size is runtime
      dynamically change based on an Exponentially Weighted Moving Average
      (EWMA) over the packet length, when refilling RX rings.
      
      This patch only disable XDP_REDIRECT support in receive_mergeable()
      case, because it can cause a real crash.
      
      IMHO we should consider NOT supporting XDP in receive_mergeable() at
      all, because the principles behind XDP are to gain speed by (1) code
      simplicity, (2) sacrificing memory and (3) where possible moving
      runtime checks to setup time.  These principles are clearly being
      violated in receive_mergeable(), that e.g. runtime track average
      buffer size to save memory consumption.
      
      In the longer run, we should consider introducing a separate receive
      function when attaching an XDP program, and also change the memory
      model to be compatible with XDP when attaching an XDP prog.
      
      Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7324f539
    • L
      RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type · f4576587
      Leon Romanovsky 提交于
      Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong
      invocation) will trigger the following kernel panic. It is caused by the
      fact that such QPs missed uobject initialization.
      
      [   17.408845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [   17.412645] IP: rdma_lookup_put_uobject+0x9/0x50
      [   17.416567] PGD 0 P4D 0
      [   17.419262] Oops: 0000 [#1] SMP PTI
      [   17.422915] CPU: 0 PID: 455 Comm: ibv_xsrq_pingpo Not tainted 4.16.0-rc1+ #86
      [   17.424765] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [   17.427399] RIP: 0010:rdma_lookup_put_uobject+0x9/0x50
      [   17.428445] RSP: 0018:ffffb8c7401e7c90 EFLAGS: 00010246
      [   17.429543] RAX: 0000000000000000 RBX: ffffb8c7401e7cf8 RCX: 0000000000000000
      [   17.432426] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
      [   17.437448] RBP: 0000000000000000 R08: 00000000000218f0 R09: ffffffff8ebc4cac
      [   17.440223] R10: fffff6038052cd80 R11: ffff967694b36400 R12: ffff96769391f800
      [   17.442184] R13: ffffb8c7401e7cd8 R14: 0000000000000000 R15: ffff967699f60000
      [   17.443971] FS:  00007fc29207d700(0000) GS:ffff96769fc00000(0000) knlGS:0000000000000000
      [   17.446623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   17.448059] CR2: 0000000000000048 CR3: 000000001397a000 CR4: 00000000000006b0
      [   17.449677] Call Trace:
      [   17.450247]  modify_qp.isra.20+0x219/0x2f0
      [   17.451151]  ib_uverbs_modify_qp+0x90/0xe0
      [   17.452126]  ib_uverbs_write+0x1d2/0x3c0
      [   17.453897]  ? __handle_mm_fault+0x93c/0xe40
      [   17.454938]  __vfs_write+0x36/0x180
      [   17.455875]  vfs_write+0xad/0x1e0
      [   17.456766]  SyS_write+0x52/0xc0
      [   17.457632]  do_syscall_64+0x75/0x180
      [   17.458631]  entry_SYSCALL_64_after_hwframe+0x21/0x86
      [   17.460004] RIP: 0033:0x7fc29198f5a0
      [   17.460982] RSP: 002b:00007ffccc71f018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [   17.463043] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007fc29198f5a0
      [   17.464581] RDX: 0000000000000078 RSI: 00007ffccc71f050 RDI: 0000000000000003
      [   17.466148] RBP: 0000000000000000 R08: 0000000000000078 R09: 00007ffccc71f050
      [   17.467750] R10: 000055b6cf87c248 R11: 0000000000000246 R12: 00007ffccc71f300
      [   17.469541] R13: 000055b6cf8733a0 R14: 0000000000000000 R15: 0000000000000000
      [   17.471151] Code: 00 00 0f 1f 44 00 00 48 8b 47 48 48 8b 00 48 8b 40 10 e9 0b 8b 68 00 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 89 f5 <48> 8b 47 48 48 89 fb 40 0f b6 f6 48 8b 00 48 8b 40 20 e8 e0 8a
      [   17.475185] RIP: rdma_lookup_put_uobject+0x9/0x50 RSP: ffffb8c7401e7c90
      [   17.476841] CR2: 0000000000000048
      [   17.477764] ---[ end trace 1dbcc5354071a712 ]---
      [   17.478880] Kernel panic - not syncing: Fatal exception
      [   17.480277] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      
      Fixes: 2f08ee36 ("RDMA/restrack: don't use uaccess_kernel()")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f4576587
  8. 21 2月, 2018 14 次提交