1. 10 8月, 2018 1 次提交
  2. 01 8月, 2018 2 次提交
    • B
      NFSv4 client live hangs after live data migration recovery · 0f90be13
      Bill Baker 提交于
      After a live data migration event at the NFS server, the client may send
      I/O requests to the wrong server, causing a live hang due to repeated
      recovery events.  On the wire, this will appear as an I/O request failing
      with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
      NFS4ERR_BADSSESSION is returned because the session ID being used was
      issued by the other server and is not valid at the old server.
      
      The failure is caused by async worker threads having cached the transport
      (xprt) in the rpc_task structure.  After the migration recovery completes,
      the task is redispatched and the task resends the request to the wrong
      server based on the old value still present in tk_xprt.
      
      The solution is to recompute the tk_xprt field of the rpc_task structure
      so that the request goes to the correct server.
      Signed-off-by: NBill Baker <bill.baker@oracle.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NHelen Chao <helen.chao@oracle.com>
      Fixes: fb43d172 ("SUNRPC: Use the multipath iterator to assign a ...")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      0f90be13
    • D
      sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt clones · 016583d7
      Dave Wysochanski 提交于
      The existing rpc_print_iostats has a few shortcomings.  First, the naming
      is not consistent with other functions in the kernel that display stats.
      Second, it is really displaying stats for an rpc_clnt structure as it
      displays both xprt stats and per-op stats.  Third, it does not handle
      rpc_clnt clones, which is important for the one in-kernel tree caller
      of this function, the NFS client's nfs_show_stats function.
      
      Fix all of the above by renaming the rpc_print_iostats to
      rpc_clnt_show_stats and looping through any rpc_clnt clones via
      cl_parent.
      
      Once this interface is fixed, this addresses a problem with NFSv4.
      Before this patch, the /proc/self/mountstats always showed incorrect
      counts for NFSv4 lease and session related opcodes such as SEQUENCE,
      RENEW, SETCLIENTID, CREATE_SESSION, etc.  These counts were always 0
      even though many ops would go over the wire.  The reason for this is
      there are multiple rpc_clnt structures allocated for any given NFSv4
      mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
      which only handled one of them, nfs_server->client.  Fix these counts
      by calling sunrpc's new rpc_clnt_show_stats() function, which handles
      cloned rpc_clnt structs and prints the stats together.
      
      Note that one side-effect of the above is that multiple mounts from
      the same NFS server will show identical counts in the above ops due
      to the fact the one rpc_clnt (representing the NFSv4 client state)
      is shared across mounts.
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      016583d7
  3. 31 7月, 2018 1 次提交
  4. 27 7月, 2018 1 次提交
  5. 23 7月, 2018 1 次提交
  6. 22 7月, 2018 3 次提交
  7. 20 7月, 2018 1 次提交
  8. 19 7月, 2018 2 次提交
    • T
      net/mlx5: Fix QP fragmented buffer allocation · d7037ad7
      Tariq Toukan 提交于
      Fix bad alignment of SQ buffer in fragmented QP allocation.
      It should start directly after RQ buffer ends.
      
      Take special care of the end case where the RQ buffer does not occupy
      a whole page. RQ size is a power of two, so would be the case only for
      small RQ sizes (RQ size < PAGE_SIZE).
      
      Fix wrong assignments for sqb->size (mistakenly assigned RQ size),
      and for npages value of RQ and SQ.
      
      Fixes: 3a2f7033 ("net/mlx5: Use order-0 allocations for all WQ types")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d7037ad7
    • S
      PCI: OF: Fix I/O space page leak · a5fb9fb0
      Sergei Shtylyov 提交于
      When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
      driver was left disabled, the kernel crashed with this BUG:
      
        kernel BUG at lib/ioremap.c:72!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
        Hardware name: Renesas Condor board based on r8a77980 (DT)
        Workqueue: events deferred_probe_work_func
        pstate: 80000005 (Nzcv daif -PAN -UAO)
        pc : ioremap_page_range+0x370/0x3c8
        lr : ioremap_page_range+0x40/0x3c8
        sp : ffff000008da39e0
        x29: ffff000008da39e0 x28: 00e8000000000f07
        x27: ffff7dfffee00000 x26: 0140000000000000
        x25: ffff7dfffef00000 x24: 00000000000fe100
        x23: ffff80007b906000 x22: ffff000008ab8000
        x21: ffff000008bb1d58 x20: ffff7dfffef00000
        x19: ffff800009c30fb8 x18: 0000000000000001
        x17: 00000000000152d0 x16: 00000000014012d0
        x15: 0000000000000000 x14: 0720072007200720
        x13: 0720072007200720 x12: 0720072007200720
        x11: 0720072007300730 x10: 00000000000000ae
        x9 : 0000000000000000 x8 : ffff7dffff000000
        x7 : 0000000000000000 x6 : 0000000000000100
        x5 : 0000000000000000 x4 : 000000007b906000
        x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
        x1 : 0000000040000000 x0 : 00e80000fe100f07
        Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
        Call trace:
         ioremap_page_range+0x370/0x3c8
         pci_remap_iospace+0x7c/0xac
         pci_parse_request_of_pci_ranges+0x13c/0x190
         rcar_pcie_probe+0x4c/0xb04
         platform_drv_probe+0x50/0xbc
         driver_probe_device+0x21c/0x308
         __device_attach_driver+0x98/0xc8
         bus_for_each_drv+0x54/0x94
         __device_attach+0xc4/0x12c
         device_initial_probe+0x10/0x18
         bus_probe_device+0x90/0x98
         deferred_probe_work_func+0xb0/0x150
         process_one_work+0x12c/0x29c
         worker_thread+0x200/0x3fc
         kthread+0x108/0x134
         ret_from_fork+0x10/0x18
        Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)
      
      It turned out that pci_remap_iospace() wasn't undone when the driver's
      probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
      the probe was retried, finally causing the BUG due to trying to remap
      already remapped pages.
      
      Introduce the devm_pci_remap_iospace() managed API and replace the
      pci_remap_iospace() call with it to fix the bug.
      
      Fixes: dbf9826d ("PCI: generic: Convert to DT resource parsing API")
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      [lorenzo.pieralisi@arm.com: split commit/updated the commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      a5fb9fb0
  9. 18 7月, 2018 1 次提交
  10. 17 7月, 2018 3 次提交
    • R
      net/ethernet/freescale/fman: fix cross-build error · c1334597
      Randy Dunlap 提交于
        CC [M]  drivers/net/ethernet/freescale/fman/fman.o
      In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35:
      ../include/linux/fsl/guts.h: In function 'guts_set_dmacr':
      ../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration]
        clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift);
        ^~~~~~~~~~~~~~~
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Madalin Bucur <madalin.bucur@nxp.com>
      Cc: netdev@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1334597
    • H
      ipv4/igmp: init group mode as INCLUDE when join source group · 6e2059b5
      Hangbin Liu 提交于
      Based on RFC3376 5.1
         If no interface
         state existed for that multicast address before the change (i.e., the
         change consisted of creating a new per-interface record), or if no
         state exists after the change (i.e., the change consisted of deleting
         a per-interface record), then the "non-existent" state is considered
         to have a filter mode of INCLUDE and an empty source list.
      
      Which means a new multicast group should start with state IN().
      
      Function ip_mc_join_group() works correctly for IGMP ASM(Any-Source Multicast)
      mode. It adds a group with state EX() and inits crcount to mc_qrv,
      so the kernel will send a TO_EX() report message after adding group.
      
      But for IGMPv3 SSM(Source-specific multicast) JOIN_SOURCE_GROUP mode, we
      split the group joining into two steps. First we join the group like ASM,
      i.e. via ip_mc_join_group(). So the state changes from IN() to EX().
      
      Then we add the source-specific address with INCLUDE mode. So the state
      changes from EX() to IN(A).
      
      Before the first step sends a group change record, we finished the second
      step. So we will only send the second change record. i.e. TO_IN(A).
      
      Regarding the RFC stands, we should actually send an ALLOW(A) message for
      SSM JOIN_SOURCE_GROUP as the state should mimic the 'IN() to IN(A)'
      transition.
      
      The issue was exposed by commit a052517a ("net/multicast: should not
      send source list records when have filter mode change"). Before this change,
      we used to send both ALLOW(A) and TO_IN(A). After this change we only send
      TO_IN(A).
      
      Fix it by adding a new parameter to init group mode. Also add new wrapper
      functions so we don't need to change too much code.
      
      v1 -> v2:
      In my first version I only cleared the group change record. But this is not
      enough. Because when a new group join, it will init as EXCLUDE and trigger
      an filter mode change in ip/ip6_mc_add_src(), which will clear all source
      addresses' sf_crcount. This will prevent early joined address sending state
      change records if multi source addressed joined at the same time.
      
      In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
      JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
      for IPv4 and IPv6.
      
      Fixes: a052517a ("net/multicast: should not send source list records when have filter mode change")
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e2059b5
    • P
      mm: don't do zero_resv_unavail if memmap is not allocated · d1b47a7c
      Pavel Tatashin 提交于
      Moving zero_resv_unavail before memmap_init_zone(), caused a regression on
      x86-32.
      
      The cause is that we access struct pages before they are allocated when
      CONFIG_FLAT_NODE_MEM_MAP is used.
      
      free_area_init_nodes()
        zero_resv_unavail()
          mm_zero_struct_page(pfn_to_page(pfn)); <- struct page is not alloced
        free_area_init_node()
          if CONFIG_FLAT_NODE_MEM_MAP
            alloc_node_mem_map()
              memblock_virt_alloc_node_nopanic() <- struct page alloced here
      
      On the other hand memblock_virt_alloc_node_nopanic() zeroes all the memory
      that it returns, so we do not need to do zero_resv_unavail() here.
      
      Fixes: e181ae0c ("mm: zero unavailable pages before memmap init")
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Tested-by: NMatt Hart <matt@mattface.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1b47a7c
  11. 13 7月, 2018 1 次提交
    • S
      net: Don't copy pfmemalloc flag in __copy_skb_header() · 8b700862
      Stefano Brivio 提交于
      The pfmemalloc flag indicates that the skb was allocated from
      the PFMEMALLOC reserves, and the flag is currently copied on skb
      copy and clone.
      
      However, an skb copied from an skb flagged with pfmemalloc
      wasn't necessarily allocated from PFMEMALLOC reserves, and on
      the other hand an skb allocated that way might be copied from an
      skb that wasn't.
      
      So we should not copy the flag on skb copy, and rather decide
      whether to allow an skb to be associated with sockets unrelated
      to page reclaim depending only on how it was allocated.
      
      Move the pfmemalloc flag before headers_start[0] using an
      existing 1-bit hole, so that __copy_skb_header() doesn't copy
      it.
      
      When cloning, we'll now take care of this flag explicitly,
      contravening to the warning comment of __skb_clone().
      
      While at it, restore the newline usage introduced by commit
      b1937227 ("net: reorganize sk_buff for faster
      __copy_skb_header()") to visually separate bytes used in
      bitfields after headers_start[0], that was gone after commit
      a9e419dc ("netfilter: merge ctinfo into nfct pointer storage
      area"), and describe the pfmemalloc flag in the kernel-doc
      structure comment.
      
      This doesn't change the size of sk_buff or cacheline boundaries,
      but consolidates the 15 bits hole before tc_index into a 2 bytes
      hole before csum, that could now be filled more easily.
      Reported-by: NPatrick Talbert <ptalbert@redhat.com>
      Fixes: c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC reserves")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b700862
  12. 11 7月, 2018 1 次提交
  13. 09 7月, 2018 1 次提交
    • R
      bpf: include errno.h from bpf-cgroup.h · f292b87d
      Roman Gushchin 提交于
      Commit fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency
      wrt CONFIG_CGROUP_BPF") caused some build issues, detected by 0-DAY
      kernel test infrastructure.
      
      The problem is that cgroup_bpf_prog_attach/detach/query() functions
      can return -EINVAL error code, which is not defined. Fix this adding
      errno.h to includes.
      
      Fixes: fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Cc: Sean Young <sean@mess.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f292b87d
  14. 08 7月, 2018 1 次提交
  15. 07 7月, 2018 2 次提交
  16. 04 7月, 2018 2 次提交
  17. 03 7月, 2018 2 次提交
    • N
      compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations · d03db2bc
      Nick Desaulniers 提交于
      Functions marked extern inline do not emit an externally visible
      function when the gnu89 C standard is used. Some KBUILD Makefiles
      overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without
      an explicit C standard specified, the default is gnu11. Since c99, the
      semantics of extern inline have changed such that an externally visible
      function is always emitted. This can lead to multiple definition errors
      of extern inline functions at link time of compilation units whose build
      files have removed an explicit C standard compiler flag for users of GCC
      5.1+ or Clang.
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@redhat.com
      Cc: akataria@vmware.com
      Cc: akpm@linux-foundation.org
      Cc: andrea.parri@amarulasolutions.com
      Cc: ard.biesheuvel@linaro.org
      Cc: aryabinin@virtuozzo.com
      Cc: astrachan@google.com
      Cc: boris.ostrovsky@oracle.com
      Cc: brijesh.singh@amd.com
      Cc: caoj.fnst@cn.fujitsu.com
      Cc: geert@linux-m68k.org
      Cc: ghackmann@google.com
      Cc: gregkh@linuxfoundation.org
      Cc: jan.kiszka@siemens.com
      Cc: jarkko.sakkinen@linux.intel.com
      Cc: jpoimboe@redhat.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: kstewart@linuxfoundation.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-kbuild@vger.kernel.org
      Cc: manojgupta@google.com
      Cc: mawilcox@microsoft.com
      Cc: michal.lkml@markovi.net
      Cc: mjg59@google.com
      Cc: mka@chromium.org
      Cc: pombredanne@nexb.com
      Cc: rientjes@google.com
      Cc: rostedt@goodmis.org
      Cc: sedat.dilek@gmail.com
      Cc: thomas.lendacky@amd.com
      Cc: tstellar@redhat.com
      Cc: tweek@google.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: will.deacon@arm.com
      Cc: yamada.masahiro@socionext.com
      Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d03db2bc
    • P
      kthread, sched/core: Fix kthread_parkme() (again...) · 1cef1150
      Peter Zijlstra 提交于
      Gaurav reports that commit:
      
        85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue")
      
      isn't working for him. Because of the following race:
      
      > controller Thread                               CPUHP Thread
      > takedown_cpu
      > kthread_park
      > kthread_parkme
      > Set KTHREAD_SHOULD_PARK
      >                                                 smpboot_thread_fn
      >                                                 set Task interruptible
      >
      >
      > wake_up_process
      >  if (!(p->state & state))
      >                 goto out;
      >
      >                                                 Kthread_parkme
      >                                                 SET TASK_PARKED
      >                                                 schedule
      >                                                 raw_spin_lock(&rq->lock)
      > ttwu_remote
      > waiting for __task_rq_lock
      >                                                 context_switch
      >
      >                                                 finish_lock_switch
      >
      >
      >
      >                                                 Case TASK_PARKED
      >                                                 kthread_park_complete
      >
      >
      > SET Running
      
      Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
      handling is buggered because the TASK_DEAD thing is done with
      preemption disabled, the current code can still complete early on
      preemption :/
      
      So basically revert that earlier fix and go with a variant of the
      alternative mentioned in the commit. Promote TASK_PARKED to special
      state to avoid the store-store issue on task->state leading to the
      WARN in kthread_unpark() -> __kthread_bind().
      
      But in addition, add wait_task_inactive() to kthread_park() to ensure
      the task really is PARKED when we return from kthread_park(). This
      avoids the whole kthread still gets migrated nonsense -- although it
      would be really good to get this done differently.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1cef1150
  18. 02 7月, 2018 2 次提交
    • H
      ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS · 240630e6
      Hans de Goede 提交于
      There have been several reports of LPM related hard freezes about once
      a day on multiple Lenovo 50 series models. Strange enough these reports
      where not disk model specific as LPM issues usually are and some users
      with the exact same disk + laptop where seeing them while other users
      where not seeing these issues.
      
      It turns out that enabling LPM triggers a firmware bug somewhere, which
      has been fixed in later BIOS versions.
      
      This commit adds a new ahci_broken_lpm() function and a new ATA_FLAG_NO_LPM
      for dealing with this.
      
      The ahci_broken_lpm() function contains DMI match info for the 4 models
      which are known to be affected by this and the DMI BIOS date field for
      known good BIOS versions. If the BIOS date is older then the one in the
      table LPM will be disabled and a warning will be printed.
      
      Note the BIOS dates are for known good versions, some older versions may
      work too, but we don't know for sure, the table is using dates from BIOS
      versions for which users have confirmed that upgrading to that version
      makes the problem go away.
      
      Unfortunately I've been unable to get hold of the reporter who reported
      that BIOS version 2.35 fixed the problems on the W541 for him. I've been
      able to verify the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION from an older
      dmidecode, but I don't know the exact BIOS date as reported in the DMI.
      Lenovo keeps a changelog with dates in their release notes, but the
      dates there are the release dates not the build dates which are in DMI.
      So I've chosen to set the date to which we compare to one day past the
      release date of the 2.34 BIOS. I plan to fix this with a follow up
      commit once I've the necessary info.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      240630e6
    • S
      net: fix use-after-free in GRO with ESP · 603d4cf8
      Sabrina Dubroca 提交于
      Since the addition of GRO for ESP, gro_receive can consume the skb and
      return -EINPROGRESS. In that case, the lower layer GRO handler cannot
      touch the skb anymore.
      
      Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
      some of the gro_receive handlers that can lead to ESP's gro_receive so
      that they wouldn't access the skb when -EINPROGRESS is returned, but
      missed other spots, mainly in tunneling protocols.
      
      This patch finishes the conversion to using skb_gro_flush_final(), and
      adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
      GUE.
      
      Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      603d4cf8
  19. 30 6月, 2018 1 次提交
    • D
      bpf: undo prog rejection on read-only lock failure · 85782e03
      Daniel Borkmann 提交于
      Partially undo commit 9facc336 ("bpf: reject any prog that failed
      read-only lock") since it caused a regression, that is, syzkaller was
      able to manage to cause a panic via fault injection deep in set_memory_ro()
      path by letting an allocation fail: In x86's __change_page_attr_set_clr()
      it was able to change the attributes of the primary mapping but not in
      the alias mapping via cpa_process_alias(), so the second, inner call
      to the __change_page_attr() via __change_page_attr_set_clr() had to split
      a larger page and failed in the alloc_pages() with the artifically triggered
      allocation error which is then propagated down to the call site.
      
      Thus, for set_memory_ro() this means that it returned with an error, but
      from debugging a probe_kernel_write() revealed EFAULT on that memory since
      the primary mapping succeeded to get changed. Therefore the subsequent
      hdr->locked = 0 reset triggered the panic as it was performed on read-only
      memory, so call-site assumptions were infact wrong to assume that it would
      either succeed /or/ not succeed at all since there's no such rollback in
      set_memory_*() calls from partial change of mappings, in other words, we're
      left in a state that is "half done". A later undo via set_memory_rw() is
      succeeding though due to matching permissions on that part (aka due to the
      try_preserve_large_page() succeeding). While reproducing locally with
      explicitly triggering this error, the initial splitting only happens on
      rare occasions and in real world it would additionally need oom conditions,
      but that said, it could partially fail. Therefore, it is definitely wrong
      to bail out on set_memory_ro() error and reject the program with the
      set_memory_*() semantics we have today. Shouldn't have gone the extra mile
      since no other user in tree today infact checks for any set_memory_*()
      errors, e.g. neither module_enable_ro() / module_disable_ro() for module
      RO/NX handling which is mostly default these days nor kprobes core with
      alloc_insn_page() / free_insn_page() as examples that could be invoked long
      after bootup and original 314beb9b ("x86: bpf_jit_comp: secure bpf jit
      against spraying attacks") did neither when it got first introduced to BPF
      so "improving" with bailing out was clearly not right when set_memory_*()
      cannot handle it today.
      
      Kees suggested that if set_memory_*() can fail, we should annotate it with
      __must_check, and all callers need to deal with it gracefully given those
      set_memory_*() markings aren't "advisory", but they're expected to actually
      do what they say. This might be an option worth to move forward in future
      but would at the same time require that set_memory_*() calls from supporting
      archs are guaranteed to be "atomic" in that they provide rollback if part
      of the range fails, once that happened, the transition from RW -> RO could
      be made more robust that way, while subsequent RO -> RW transition /must/
      continue guaranteeing to always succeed the undo part.
      
      Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com
      Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com
      Fixes: 9facc336 ("bpf: reject any prog that failed read-only lock")
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      85782e03
  20. 29 6月, 2018 4 次提交
    • J
      sg: remove ->sg_magic member · 9544bc53
      Jens Axboe 提交于
      This was introduced more than a decade ago when sg chaining was
      added, but we never really caught anything with it. The scatterlist
      entry size can be critical, since drivers allocate it, so remove
      the magic member. Recently it's been triggering allocation stalls
      and failures in NVMe.
      Tested-by: NJordan Glover <Golden_Miller83@protonmail.ch>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9544bc53
    • S
      include/linux/dax.h: dax_iomap_fault() returns vm_fault_t · f77bc3a8
      Souptick Joarder 提交于
      Commit 1c8f4220 ("mm: change return type to vm_fault_t") missed a
      conversion.  It's not a big problem at present because mainline is still
      using
      
      	typedef int vm_fault_t;
      
      Fixes: 1c8f4220 ("mm: change return type to vm_fault_t")
      Link: http://lkml.kernel.org/r/20180620172046.GA27894@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f77bc3a8
    • M
      slub: fix failure when we delete and create a slab cache · d50d82fa
      Mikulas Patocka 提交于
      In kernel 4.17 I removed some code from dm-bufio that did slab cache
      merging (commit 21bb1327: "dm bufio: remove code that merges slab
      caches") - both slab and slub support merging caches with identical
      attributes, so dm-bufio now just calls kmem_cache_create and relies on
      implicit merging.
      
      This uncovered a bug in the slub subsystem - if we delete a cache and
      immediatelly create another cache with the same attributes, it fails
      because of duplicate filename in /sys/kernel/slab/.  The slub subsystem
      offloads freeing the cache to a workqueue - and if we create the new
      cache before the workqueue runs, it complains because of duplicate
      filename in sysfs.
      
      This patch fixes the bug by moving the call of kobject_del from
      sysfs_slab_remove_workfn to shutdown_cache.  kobject_del must be called
      while we hold slab_mutex - so that the sysfs entry is deleted before a
      cache with the same attributes could be created.
      
      Running device-mapper-test-suite with:
      
        dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/
      
      triggered:
      
        Buffer I/O error on dev dm-0, logical block 1572848, async page read
        device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5
        device-mapper: thin: 253:1: aborting current metadata transaction
        sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144'
        CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25
        Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017
        Workqueue: dm-thin do_worker [dm_thin_pool]
        Call Trace:
         dump_stack+0x5a/0x73
         sysfs_warn_dup+0x58/0x70
         sysfs_create_dir_ns+0x77/0x80
         kobject_add_internal+0xba/0x2e0
         kobject_init_and_add+0x70/0xb0
         sysfs_slab_add+0xb1/0x250
         __kmem_cache_create+0x116/0x150
         create_cache+0xd9/0x1f0
         kmem_cache_create_usercopy+0x1c1/0x250
         kmem_cache_create+0x18/0x20
         dm_bufio_client_create+0x1ae/0x410 [dm_bufio]
         dm_block_manager_create+0x5e/0x90 [dm_persistent_data]
         __create_persistent_data_objects+0x38/0x940 [dm_thin_pool]
         dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool]
         metadata_operation_failed+0x59/0x100 [dm_thin_pool]
         alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool]
         process_cell+0x2a3/0x550 [dm_thin_pool]
         do_worker+0x28d/0x8f0 [dm_thin_pool]
         process_one_work+0x171/0x370
         worker_thread+0x49/0x3f0
         kthread+0xf8/0x130
         ret_from_fork+0x35/0x40
        kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory.
        kmem_cache_create(dm_bufio_buffer-16) failed with error -17
      
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reported-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d50d82fa
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  21. 27 6月, 2018 2 次提交
  22. 26 6月, 2018 1 次提交
    • S
      bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF · fdb5c453
      Sean Young 提交于
      If the kernel is compiled with CONFIG_CGROUP_BPF not enabled, it is not
      possible to attach, detach or query IR BPF programs to /dev/lircN devices,
      making them impossible to use. For embedded devices, it should be possible
      to use IR decoding without cgroups or CONFIG_CGROUP_BPF enabled.
      
      This change requires some refactoring, since bpf_prog_{attach,detach,query}
      functions are now always compiled, but their code paths for cgroups need
      moving out. Rather than a #ifdef CONFIG_CGROUP_BPF in kernel/bpf/syscall.c,
      moving them to kernel/bpf/cgroup.c and kernel/bpf/sockmap.c does not
      require #ifdefs since that is already conditionally compiled.
      
      Fixes: f4364dcf ("media: rc: introduce BPF_PROG_LIRC_MODE2")
      Signed-off-by: NSean Young <sean@mess.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fdb5c453
  23. 25 6月, 2018 4 次提交
    • A
      disable -Wattribute-alias warning for SYSCALL_DEFINEx() · bee20031
      Arnd Bergmann 提交于
      gcc-8 warns for every single definition of a system call entry
      point, e.g.:
      
      include/linux/compat.h:56:18: error: 'compat_sys_rt_sigprocmask' alias between functions of incompatible types 'long int(int,  compat_sigset_t *, compat_sigset_t *, compat_size_t)' {aka 'long int(int,  struct <anonymous> *, struct <anonymous> *, unsigned int)'} and 'long int(long int,  long int,  long int,  long int)' [-Werror=attribute-alias]
        asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
                        ^~~~~~~~~~
      include/linux/compat.h:45:2: note: in expansion of macro 'COMPAT_SYSCALL_DEFINEx'
        COMPAT_SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
        ^~~~~~~~~~~~~~~~~~~~~~
      kernel/signal.c:2601:1: note: in expansion of macro 'COMPAT_SYSCALL_DEFINE4'
       COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, int, how, compat_sigset_t __user *, nset,
       ^~~~~~~~~~~~~~~~~~~~~~
      include/linux/compat.h:60:18: note: aliased declaration here
        asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))\
                        ^~~~~~~~~~
      
      The new warning seems reasonable in principle, but it doesn't
      help us here, since we rely on the type mismatch to sanitize the
      system call arguments. After I reported this as GCC PR82435, a new
      -Wno-attribute-alias option was added that could be used to turn the
      warning off globally on the command line, but I'd prefer to do it a
      little more fine-grained.
      
      Interestingly, turning a warning off and on again inside of
      a single macro doesn't always work, in this case I had to add
      an extra statement inbetween and decided to copy the __SC_TEST
      one from the native syscall to the compat syscall macro.  See
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83256 for more details
      about this.
      
      [paul.burton@mips.com:
        - Rebase atop current master.
        - Split GCC & version arguments to __diag_ignore() in order to match
          changes to the preceding patch.
        - Add the comment argument to match the preceding patch.]
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82435Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NStafford Horne <shorne@gmail.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      bee20031
    • A
      kbuild: add macro for controlling warnings to linux/compiler.h · 8793bb7f
      Arnd Bergmann 提交于
      I have occasionally run into a situation where it would make sense to
      control a compiler warning from a source file rather than doing so from
      a Makefile using the $(cc-disable-warning, ...) or $(cc-option, ...)
      helpers.
      
      The approach here is similar to what glibc uses, using __diag() and
      related macros to encapsulate a _Pragma("GCC diagnostic ...") statement
      that gets turned into the respective "#pragma GCC diagnostic ..." by
      the preprocessor when the macro gets expanded.
      
      Like glibc, I also have an argument to pass the affected compiler
      version, but decided to actually evaluate that one. For now, this
      supports GCC_4_6, GCC_4_7, GCC_4_8, GCC_4_9, GCC_5, GCC_6, GCC_7,
      GCC_8 and GCC_9. Adding support for CLANG_5 and other interesting
      versions is straightforward here. GNU compilers starting with gcc-4.2
      could support it in principle, but "#pragma GCC diagnostic push"
      was only added in gcc-4.6, so it seems simpler to not deal with those
      at all. The same versions show a large number of warnings already,
      so it seems easier to just leave it at that and not do a more
      fine-grained control for them.
      
      The use cases I found so far include:
      
      - turning off the gcc-8 -Wattribute-alias warning inside of the
        SYSCALL_DEFINEx() macro without having to do it globally.
      
      - Reducing the build time for a simple re-make after a change,
        once we move the warnings from ./Makefile and
        ./scripts/Makefile.extrawarn into linux/compiler.h
      
      - More control over the warnings based on other configurations,
        using preprocessor syntax instead of Makefile syntax. This should make
        it easier for the average developer to understand and change things.
      
      - Adding an easy way to turn the W=1 option on unconditionally
        for a subdirectory or a specific file. This has been requested
        by several developers in the past that want to have their subsystems
        W=1 clean.
      
      - Integrating clang better into the build systems. Clang supports
        more warnings than GCC, and we probably want to classify them
        as default, W=1, W=2 etc, but there are cases in which the
        warnings should be classified differently due to excessive false
        positives from one or the other compiler.
      
      - Adding a way to turn the default warnings into errors (e.g. using
        a new "make E=0" tag) while not also turning the W=1 warnings into
        errors.
      
      This patch for now just adds the minimal infrastructure in order to
      do the first of the list above. As the #pragma GCC diagnostic
      takes precedence over command line options, the next step would be
      to convert a lot of the individual Makefiles that set nonstandard
      options to use __diag() instead.
      
      [paul.burton@mips.com:
        - Rebase atop current master.
        - Add __diag_GCC, or more generally __diag_<compiler>, abstraction to
          avoid code outside of linux/compiler-gcc.h needing to duplicate
          knowledge about different GCC versions.
        - Add a comment argument to __diag_{ignore,warn,error} which isn't
          used in the expansion of the macros but serves to push people to
          document the reason for using them - per feedback from Kees Cook.
        - Translate severity to GCC-specific pragmas in linux/compiler-gcc.h
          rather than using GCC-specific in linux/compiler_types.h.
        - Drop all but GCC 8 macros, since we only need to define macros for
          versions that we need to introduce pragmas for, and as of this
          series that's just GCC 8.
        - Capitalize comments in linux/compiler-gcc.h to match the style of
          the rest of the file.
        - Line up macro definitions with tabs in linux/compiler-gcc.h.]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NStafford Horne <shorne@gmail.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      8793bb7f
    • H
      acpi: Add helper for deactivating memory region · d2d2e3c4
      Heikki Krogerus 提交于
      Sometimes memory resource may be overlapping with
      SystemMemory Operation Region by design, for example if the
      memory region is used as a mailbox for communication with a
      firmware in the system. One occasion of such mailboxes is
      USB Type-C Connector System Software Interface (UCSI).
      
      With regions like that, it is important that the driver is
      able to map the memory with the requirements it has. For
      example, the driver should be allowed to map the memory as
      non-cached memory. However, if the operation region has been
      accessed before the driver has mapped the memory, the memory
      has been marked as write-back by the time the driver is
      loaded. That means the driver will fail to map the memory
      if it expects non-cached memory.
      
      To work around the problem, introducing helper that the
      drivers can use to temporarily deactivate (unmap)
      SystemMemory Operation Regions that overlap with their
      IO memory.
      
      Fixes: 8243edf4 ("usb: typec: ucsi: Add ACPI driver")
      Cc: stable@vger.kernel.org
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2d2e3c4
    • B
      HID: core: allow concurrent registration of drivers · 8f732850
      Benjamin Tissoires 提交于
      Detected on the Dell XPS 9365.
      
      The laptop has 2 devices that benefit from the hid-generic auto-unbinding.
      When those 2 devices are presented to the userspace, udev loads both wacom and
      hid-multitouch. When this happens, the code in __hid_bus_reprobe_drivers() is
      called concurrently and the second device gets reprobed twice.
      
      An other bug in the power_supply subsystem prevent to remove the wacom driver
      if it just finished its initialization, which basically kills the wacom node.
      
      [jkosina@suse.cz: reformat changelog a bit]
      Fixes c17a7476 ("HID: core: rewrite the hid-generic automatic unbind")
      Cc: stable@vger.kernel.org # v4.17
      Tested-by: NMario Limonciello <mario.limonciello@dell.com>
      Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      8f732850