1. 03 8月, 2018 2 次提交
  2. 29 6月, 2018 5 次提交
    • J
      sg: remove ->sg_magic member · 9544bc53
      Jens Axboe 提交于
      This was introduced more than a decade ago when sg chaining was
      added, but we never really caught anything with it. The scatterlist
      entry size can be critical, since drivers allocate it, so remove
      the magic member. Recently it's been triggering allocation stalls
      and failures in NVMe.
      Tested-by: NJordan Glover <Golden_Miller83@protonmail.ch>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9544bc53
    • A
      aio: mark __aio_sigset::sigmask const · 2cd3ae21
      Avi Kivity 提交于
      io_pgetevents() will not change the signal mask.  Mark it const to make
      it clear and to reduce the need for casts in user code.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAvi Kivity <avi@scylladb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      [hch: reapply the patch that got incorrectly reverted]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2cd3ae21
    • S
      include/linux/dax.h: dax_iomap_fault() returns vm_fault_t · f77bc3a8
      Souptick Joarder 提交于
      Commit 1c8f4220 ("mm: change return type to vm_fault_t") missed a
      conversion.  It's not a big problem at present because mainline is still
      using
      
      	typedef int vm_fault_t;
      
      Fixes: 1c8f4220 ("mm: change return type to vm_fault_t")
      Link: http://lkml.kernel.org/r/20180620172046.GA27894@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f77bc3a8
    • M
      slub: fix failure when we delete and create a slab cache · d50d82fa
      Mikulas Patocka 提交于
      In kernel 4.17 I removed some code from dm-bufio that did slab cache
      merging (commit 21bb1327: "dm bufio: remove code that merges slab
      caches") - both slab and slub support merging caches with identical
      attributes, so dm-bufio now just calls kmem_cache_create and relies on
      implicit merging.
      
      This uncovered a bug in the slub subsystem - if we delete a cache and
      immediatelly create another cache with the same attributes, it fails
      because of duplicate filename in /sys/kernel/slab/.  The slub subsystem
      offloads freeing the cache to a workqueue - and if we create the new
      cache before the workqueue runs, it complains because of duplicate
      filename in sysfs.
      
      This patch fixes the bug by moving the call of kobject_del from
      sysfs_slab_remove_workfn to shutdown_cache.  kobject_del must be called
      while we hold slab_mutex - so that the sysfs entry is deleted before a
      cache with the same attributes could be created.
      
      Running device-mapper-test-suite with:
      
        dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/
      
      triggered:
      
        Buffer I/O error on dev dm-0, logical block 1572848, async page read
        device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5
        device-mapper: thin: 253:1: aborting current metadata transaction
        sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144'
        CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25
        Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017
        Workqueue: dm-thin do_worker [dm_thin_pool]
        Call Trace:
         dump_stack+0x5a/0x73
         sysfs_warn_dup+0x58/0x70
         sysfs_create_dir_ns+0x77/0x80
         kobject_add_internal+0xba/0x2e0
         kobject_init_and_add+0x70/0xb0
         sysfs_slab_add+0xb1/0x250
         __kmem_cache_create+0x116/0x150
         create_cache+0xd9/0x1f0
         kmem_cache_create_usercopy+0x1c1/0x250
         kmem_cache_create+0x18/0x20
         dm_bufio_client_create+0x1ae/0x410 [dm_bufio]
         dm_block_manager_create+0x5e/0x90 [dm_persistent_data]
         __create_persistent_data_objects+0x38/0x940 [dm_thin_pool]
         dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool]
         metadata_operation_failed+0x59/0x100 [dm_thin_pool]
         alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool]
         process_cell+0x2a3/0x550 [dm_thin_pool]
         do_worker+0x28d/0x8f0 [dm_thin_pool]
         process_one_work+0x171/0x370
         worker_thread+0x49/0x3f0
         kthread+0xf8/0x130
         ret_from_fork+0x35/0x40
        kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory.
        kmem_cache_create(dm_bufio_buffer-16) failed with error -17
      
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reported-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d50d82fa
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  3. 27 6月, 2018 1 次提交
  4. 25 6月, 2018 5 次提交
    • A
      disable -Wattribute-alias warning for SYSCALL_DEFINEx() · bee20031
      Arnd Bergmann 提交于
      gcc-8 warns for every single definition of a system call entry
      point, e.g.:
      
      include/linux/compat.h:56:18: error: 'compat_sys_rt_sigprocmask' alias between functions of incompatible types 'long int(int,  compat_sigset_t *, compat_sigset_t *, compat_size_t)' {aka 'long int(int,  struct <anonymous> *, struct <anonymous> *, unsigned int)'} and 'long int(long int,  long int,  long int,  long int)' [-Werror=attribute-alias]
        asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
                        ^~~~~~~~~~
      include/linux/compat.h:45:2: note: in expansion of macro 'COMPAT_SYSCALL_DEFINEx'
        COMPAT_SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
        ^~~~~~~~~~~~~~~~~~~~~~
      kernel/signal.c:2601:1: note: in expansion of macro 'COMPAT_SYSCALL_DEFINE4'
       COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, int, how, compat_sigset_t __user *, nset,
       ^~~~~~~~~~~~~~~~~~~~~~
      include/linux/compat.h:60:18: note: aliased declaration here
        asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))\
                        ^~~~~~~~~~
      
      The new warning seems reasonable in principle, but it doesn't
      help us here, since we rely on the type mismatch to sanitize the
      system call arguments. After I reported this as GCC PR82435, a new
      -Wno-attribute-alias option was added that could be used to turn the
      warning off globally on the command line, but I'd prefer to do it a
      little more fine-grained.
      
      Interestingly, turning a warning off and on again inside of
      a single macro doesn't always work, in this case I had to add
      an extra statement inbetween and decided to copy the __SC_TEST
      one from the native syscall to the compat syscall macro.  See
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83256 for more details
      about this.
      
      [paul.burton@mips.com:
        - Rebase atop current master.
        - Split GCC & version arguments to __diag_ignore() in order to match
          changes to the preceding patch.
        - Add the comment argument to match the preceding patch.]
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82435Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NStafford Horne <shorne@gmail.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      bee20031
    • A
      kbuild: add macro for controlling warnings to linux/compiler.h · 8793bb7f
      Arnd Bergmann 提交于
      I have occasionally run into a situation where it would make sense to
      control a compiler warning from a source file rather than doing so from
      a Makefile using the $(cc-disable-warning, ...) or $(cc-option, ...)
      helpers.
      
      The approach here is similar to what glibc uses, using __diag() and
      related macros to encapsulate a _Pragma("GCC diagnostic ...") statement
      that gets turned into the respective "#pragma GCC diagnostic ..." by
      the preprocessor when the macro gets expanded.
      
      Like glibc, I also have an argument to pass the affected compiler
      version, but decided to actually evaluate that one. For now, this
      supports GCC_4_6, GCC_4_7, GCC_4_8, GCC_4_9, GCC_5, GCC_6, GCC_7,
      GCC_8 and GCC_9. Adding support for CLANG_5 and other interesting
      versions is straightforward here. GNU compilers starting with gcc-4.2
      could support it in principle, but "#pragma GCC diagnostic push"
      was only added in gcc-4.6, so it seems simpler to not deal with those
      at all. The same versions show a large number of warnings already,
      so it seems easier to just leave it at that and not do a more
      fine-grained control for them.
      
      The use cases I found so far include:
      
      - turning off the gcc-8 -Wattribute-alias warning inside of the
        SYSCALL_DEFINEx() macro without having to do it globally.
      
      - Reducing the build time for a simple re-make after a change,
        once we move the warnings from ./Makefile and
        ./scripts/Makefile.extrawarn into linux/compiler.h
      
      - More control over the warnings based on other configurations,
        using preprocessor syntax instead of Makefile syntax. This should make
        it easier for the average developer to understand and change things.
      
      - Adding an easy way to turn the W=1 option on unconditionally
        for a subdirectory or a specific file. This has been requested
        by several developers in the past that want to have their subsystems
        W=1 clean.
      
      - Integrating clang better into the build systems. Clang supports
        more warnings than GCC, and we probably want to classify them
        as default, W=1, W=2 etc, but there are cases in which the
        warnings should be classified differently due to excessive false
        positives from one or the other compiler.
      
      - Adding a way to turn the default warnings into errors (e.g. using
        a new "make E=0" tag) while not also turning the W=1 warnings into
        errors.
      
      This patch for now just adds the minimal infrastructure in order to
      do the first of the list above. As the #pragma GCC diagnostic
      takes precedence over command line options, the next step would be
      to convert a lot of the individual Makefiles that set nonstandard
      options to use __diag() instead.
      
      [paul.burton@mips.com:
        - Rebase atop current master.
        - Add __diag_GCC, or more generally __diag_<compiler>, abstraction to
          avoid code outside of linux/compiler-gcc.h needing to duplicate
          knowledge about different GCC versions.
        - Add a comment argument to __diag_{ignore,warn,error} which isn't
          used in the expansion of the macros but serves to push people to
          document the reason for using them - per feedback from Kees Cook.
        - Translate severity to GCC-specific pragmas in linux/compiler-gcc.h
          rather than using GCC-specific in linux/compiler_types.h.
        - Drop all but GCC 8 macros, since we only need to define macros for
          versions that we need to introduce pragmas for, and as of this
          series that's just GCC 8.
        - Capitalize comments in linux/compiler-gcc.h to match the style of
          the rest of the file.
        - Line up macro definitions with tabs in linux/compiler-gcc.h.]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: NStafford Horne <shorne@gmail.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      8793bb7f
    • H
      acpi: Add helper for deactivating memory region · d2d2e3c4
      Heikki Krogerus 提交于
      Sometimes memory resource may be overlapping with
      SystemMemory Operation Region by design, for example if the
      memory region is used as a mailbox for communication with a
      firmware in the system. One occasion of such mailboxes is
      USB Type-C Connector System Software Interface (UCSI).
      
      With regions like that, it is important that the driver is
      able to map the memory with the requirements it has. For
      example, the driver should be allowed to map the memory as
      non-cached memory. However, if the operation region has been
      accessed before the driver has mapped the memory, the memory
      has been marked as write-back by the time the driver is
      loaded. That means the driver will fail to map the memory
      if it expects non-cached memory.
      
      To work around the problem, introducing helper that the
      drivers can use to temporarily deactivate (unmap)
      SystemMemory Operation Regions that overlap with their
      IO memory.
      
      Fixes: 8243edf4 ("usb: typec: ucsi: Add ACPI driver")
      Cc: stable@vger.kernel.org
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2d2e3c4
    • V
      PM / Domains: Rename opp_node to np · ad6384ba
      Viresh Kumar 提交于
      The DT node passed here isn't necessarily an OPP node, as this routine
      can also be used for cases where the "required-opps" property is present
      directly in the device's node. Rename it.
      
      This also removes a stale comment.
      Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ad6384ba
    • V
      PM / Domains: Fix return value of of_genpd_opp_to_performance_state() · 5e03aa61
      Viresh Kumar 提交于
      of_genpd_opp_to_performance_state() should return 0 for errors, but the
      dummy routine isn't doing that. Fix it.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5e03aa61
  5. 24 6月, 2018 1 次提交
  6. 23 6月, 2018 2 次提交
    • J
      bdi: Fix another oops in wb_workfn() · 3ee7e869
      Jan Kara 提交于
      syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
      wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
      WB_shutting_down after wb->bdi->dev became NULL. This indicates that
      unregister_bdi() failed to call wb_shutdown() on one of wb objects.
      
      The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
      drops bdi's reference to wb structures before going through the list of
      wbs again and calling wb_shutdown() on each of them. This way the loop
      iterating through all wbs can easily miss a wb if that wb has already
      passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
      from cgwb_release_workfn() and as a result fully shutdown bdi although
      wb_workfn() for this wb structure is still running. In fact there are
      also other ways cgwb_bdi_unregister() can race with
      cgwb_release_workfn() leading e.g. to use-after-free issues:
      
      CPU1                            CPU2
                                      cgwb_bdi_unregister()
                                        cgwb_kill(*slot);
      
      cgwb_release()
        queue_work(cgwb_release_wq, &wb->release_work);
      cgwb_release_workfn()
                                        wb = list_first_entry(&bdi->wb_list, ...)
                                        spin_unlock_irq(&cgwb_lock);
        wb_shutdown(wb);
        ...
        kfree_rcu(wb, rcu);
                                        wb_shutdown(wb); -> oops use-after-free
      
      We solve these issues by synchronizing writeback structure shutdown from
      cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
      way we also no longer need synchronization using WB_shutting_down as the
      mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
      CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
      bdi_unregister().
      Reported-by: Nsyzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3ee7e869
    • W
      rseq: Avoid infinite recursion when delivering SIGSEGV · 784e0300
      Will Deacon 提交于
      When delivering a signal to a task that is using rseq, we call into
      __rseq_handle_notify_resume() so that the registers pushed in the
      sigframe are updated to reflect the state of the restartable sequence
      (for example, ensuring that the signal returns to the abort handler if
      necessary).
      
      However, if the rseq management fails due to an unrecoverable fault when
      accessing userspace or certain combinations of RSEQ_CS_* flags, then we
      will attempt to deliver a SIGSEGV. This has the potential for infinite
      recursion if the rseq code continuously fails on signal delivery.
      
      Avoid this problem by using force_sigsegv() instead of force_sig(), which
      is explicitly designed to reset the SEGV handler to SIG_DFL in the case
      of a recursive fault. In doing so, remove rseq_signal_deliver() from the
      internal rseq API and have an optional struct ksignal * parameter to
      rseq_handle_notify_resume() instead.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: peterz@infradead.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: boqun.feng@gmail.com
      Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
      784e0300
  7. 22 6月, 2018 3 次提交
  8. 21 6月, 2018 5 次提交
  9. 20 6月, 2018 2 次提交
    • B
      ACPI / processor: Finish making acpi_processor_ppc_has_changed() void · a507a306
      Brian Norris 提交于
      Commit bca5f557 "ACPI / processor: Make acpi_processor_ppc_has_changed()
      void" changed one of the declarations of acpi_processor_ppc_has_changed()
      to return void, but the !CPU_FREQ version still returns int. Let's return
      void to be consistent.
      
      Fixes: bca5f557 "ACPI / processor: Make acpi_processor_ppc_has_changed() void"
      Signed-off-by: NBrian Norris <briannorris@chromium.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a507a306
    • E
      net/ipv6: respect rcu grace period before freeing fib6_info · 9b0a8da8
      Eric Dumazet 提交于
      syzbot reported use after free that is caused by fib6_info being
      freed without a proper RCU grace period.
      
      CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       __read_once_size include/linux/compiler.h:188 [inline]
       find_rr_leaf net/ipv6/route.c:705 [inline]
       rt6_select net/ipv6/route.c:761 [inline]
       fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823
       ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082
       fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110
       ip6_route_output include/net/ip6_route.h:82 [inline]
       icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline]
       icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535
       icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244
       dst_link_failure include/net/dst.h:427 [inline]
       ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695
       neigh_invalidate+0x246/0x550 net/core/neighbour.c:892
       neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:404
       exiting_irq arch/x86/include/asm/apic.h:527 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
       </IRQ>
      RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482
      Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 <84> d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48
      RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0
      RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000
      R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0
      R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00
       strlen include/linux/string.h:267 [inline]
       getname_kernel+0x24/0x370 fs/namei.c:218
       open_exec+0x17/0x70 fs/exec.c:882
       load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780
       search_binary_handler+0x17d/0x570 fs/exec.c:1653
       exec_binprm fs/exec.c:1695 [inline]
       __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819
       do_execveat_common fs/exec.c:1866 [inline]
       do_execve fs/exec.c:1883 [inline]
       __do_sys_execve fs/exec.c:1964 [inline]
       __se_sys_execve fs/exec.c:1959 [inline]
       __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f1576a46207
      Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02
      RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207
      RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670
      RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c
      R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10
      R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005
      
      Allocated by task 12188:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
       kmalloc include/linux/slab.h:513 [inline]
       kzalloc include/linux/slab.h:706 [inline]
       fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152
       ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013
       ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154
       ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660
       inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
       sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
       sock_ioctl+0x30d/0x680 net/socket.c:1097
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 1402:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kfree+0xd9/0x260 mm/slab.c:3813
       fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207
       fib6_info_release include/net/ip6_fib.h:286 [inline]
       __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline]
       ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316
       ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663
       inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
       sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
       sock_ioctl+0x30d/0x680 net/socket.c:1097
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801b5df2580
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 8 bytes inside of
       256-byte region [ffff8801b5df2580, ffff8801b5df2680)
      The buggy address belongs to the page:
      page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0
      raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      > ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
       ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      
      Fixes: a64efe14 ("net/ipv6: introduce fib6_info struct and helpers")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b0a8da8
  10. 19 6月, 2018 4 次提交
    • T
      pNFS/flexfiles: Don't tie up all the rpciod threads in resends · 42f86b44
      Trond Myklebust 提交于
      We do not want to have rpciod threads perform recursive calls into the
      RPC layer since that can deadlock. In particular, having to wait for
      a layoutget can be nasty... We want rather to defer scheduling those
      retries until we're in the rpc_release() callback, since that is
      called from the nfsiod workqueue.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      42f86b44
    • R
      xen: share start flags between PV and PVH · 1fe83888
      Roger Pau Monne 提交于
      Use a global variable to store the start flags for both PV and PVH.
      This allows the xen_initial_domain macro to work properly on PVH.
      
      Note that ARM is also switched to use the new variable.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      1fe83888
    • B
      scsi: target: tcmu: add read length support · 6c3796d1
      bstroesser@ts.fujitsu.com 提交于
      Generally target core and TCMUser seem to work fine for tape devices and
      media changers.  But there is at least one situation where TCMUser is not
      able to support sequential access device emulation correctly.
      
      The situation is when an initiator sends a SCSI READ CDB with a length that
      is greater than the length of the tape block to read. We can distinguish
      two subcases:
      
      A) The initiator sent the READ CDB with the SILI bit being set.
      
         In this case the sequential access device has to transfer the data from
         the tape block (only the length of the tape block) and transmit a good
         status.  The current interface between TCMUser and the userspace does
         not support reduction of the read data size by the userspace program.
      
         The patch below fixes this subcase by allowing the userspace program to
         specify a reduced data size in read direction.
      
      B) The initiator sent the READ CDB with the SILI bit not being set.
      
         In this case the sequential access device has to transfer the data from
         the tape block as in A), but additionally has to transmit CHECK
         CONDITION with the ILI bit set and NO SENSE in the sensebytes. The
         information field in the sensebytes must contain the residual count.
      
         With the below patch a user space program can specify the real read data
         length and appropriate sensebytes.  TCMUser then uses the se_cmd flag
         SCF_TREAT_READ_AS_NORMAL, to force target core to transmit the real data
         size and the sensebytes.  Note: the flag SCF_TREAT_READ_AS_NORMAL is
         introduced by Lee Duncan's patch "[PATCH v4] target: transport should
         handle st FM/EOM/ILI reads" from Tue, 15 May 2018 18:25:24 -0700.
      Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
      Acked-by: NMike Christie <mchristi@redhat.com>
      Reviewed-by: NLee Duncan <lduncan@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      6c3796d1
    • B
      RDMA/core: Save kernel caller name when creating CQ using ib_create_cq() · 7350cdd0
      Bharat Potnuri 提交于
      Few kernel applications like SCST-iSER create CQ using ib_create_cq(),
      where accessing CQ structures using rdma restrack tool leads to below NULL
      pointer dereference. This patch saves caller kernel module name similar to
      ib_alloc_cq().
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8132ca70>] skip_spaces+0x30/0x30
      PGD 738bac067 PUD 8533f0067 PMD 0
      Oops: 0000 [#1] SMP
      R10: ffff88017fc03300 R11: 0000000000000246 R12: 0000000000000000
      R13: ffff88082fa5a668 R14: ffff88017475a000 R15: 0000000000000000
      FS:  00002b32726582c0(0000) GS:ffff88087fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000008491a1000 CR4: 00000000003607e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       [<ffffffffc05af69c>] ? fill_res_name_pid+0x7c/0x90 [ib_core]
       [<ffffffffc05af79f>] fill_res_cq_entry+0xef/0x170 [ib_core]
       [<ffffffffc05af4c4>] res_get_common_dumpit+0x3c4/0x480 [ib_core]
       [<ffffffffc05af5d3>] nldev_res_get_cq_dumpit+0x13/0x20 [ib_core]
       [<ffffffff815bc1e7>] netlink_dump+0x117/0x2e0
       [<ffffffff815bcb8b>] __netlink_dump_start+0x1ab/0x230
       [<ffffffffc059fead>] ibnl_rcv_msg+0x11d/0x1f0 [ib_core]
       [<ffffffffc05af5c0>] ? nldev_res_get_mr_dumpit+0x20/0x20 [ib_core]
       [<ffffffffc059fd90>] ? rdma_nl_multicast+0x30/0x30 [ib_core]
       [<ffffffff815bea49>] netlink_rcv_skb+0xa9/0xc0
       [<ffffffffc05a0018>] ibnl_rcv+0x98/0xb0 [ib_core]
       [<ffffffff815be132>] netlink_unicast+0xf2/0x1b0
       [<ffffffff815be50f>] netlink_sendmsg+0x31f/0x6a0
       [<ffffffff8156b580>] sock_sendmsg+0xb0/0xf0
       [<ffffffff816ace9e>] ? _raw_spin_unlock_bh+0x1e/0x20
       [<ffffffff8156f998>] ? release_sock+0x118/0x170
       [<ffffffff8156b731>] SYSC_sendto+0x121/0x1c0
       [<ffffffff81568340>] ? sock_alloc_file+0xa0/0x140
       [<ffffffff81221265>] ? __fd_install+0x25/0x60
       [<ffffffff8156c2ce>] SyS_sendto+0xe/0x10
       [<ffffffff816b6c2a>] system_call_fastpath+0x16/0x1b
      RIP  [<ffffffff8132ca70>] skip_spaces+0x30/0x30
      RSP <ffff88072be97760>
      CR2: 0000000000000000
      
      Cc: <stable@vger.kernel.org>
      Fixes: f66c8ba4 ("RDMA/core: Save kernel caller name when creating PD and CQ objects")
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7350cdd0
  11. 17 6月, 2018 2 次提交
    • S
      firmware: dmi: Add access to the SKU ID string · b23908d3
      Simon Glass 提交于
      This is used in some systems from user space for determining the identity
      of the device.
      
      Expose this as a file so that that user-space tools don't need to read
      from /sys/firmware/dmi/tables/DMI
      Signed-off-by: NSimon Glass <sjg@chromium.org>
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      b23908d3
    • D
      atm: Preserve value of skb->truesize when accounting to vcc · 9bbe60a6
      David Woodhouse 提交于
      ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
      which they are to be sent. But it doesn't take ownership of those
      packets from the sock (if any) which originally owned them. They should
      remain owned by their actual sender until they've left the box.
      
      There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
      for certain skbs, precisely to avoid messing up sk_wmem_alloc
      accounting. Ideally that hack would cover the ATM use case too, but it
      doesn't — skbs which aren't owned by any sock, for example PPP control
      frames, still get their truesize adjusted when the low-level ATM driver
      adds headroom.
      
      This has always been an issue, it seems. The truesize of a packet
      increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
      for normal traffic, only for control frames. So I think we just got away
      with it, and we probably needed to send 2GiB of LCP echo frames before
      the misaccounting would ever have caused a problem and caused
      atm_may_send() to start refusing packets.
      
      Commit 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to
      refcount_t") did exactly what it was intended to do, and turned this
      mostly-theoretical problem into a real one, causing PPPoATM to fail
      immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
      starts refusing to allow new packets.
      
      The least intrusive solution to this problem is to stash the value of
      skb->truesize that was accounted to the VCC, in a new member of the
      ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
      value instead of the then-current value of skb->truesize.
      
      Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Tested-by: NKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bbe60a6
  12. 16 6月, 2018 6 次提交
  13. 15 6月, 2018 2 次提交
    • D
      afs: Display manually added cells in dynamic root mount · 0da0b7fd
      David Howells 提交于
      Alter the dynroot mount so that cells created by manipulation of
      /proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
      cell as a module parameter will cause directories for those cells to be
      created in the dynamic root superblock for the network namespace[*].
      
      To this end:
      
       (1) Only one dynamic root superblock is now created per network namespace
           and this is shared between all attempts to mount it.  This makes it
           easier to find the superblock to modify.
      
       (2) When a dynamic root superblock is created, the list of cells is walked
           and directories created for each cell already defined.
      
       (3) When a new cell is added, if a dynamic root superblock exists, a
           directory is created for it.
      
       (4) When a cell is destroyed, the directory is removed.
      
       (5) These directories are created by calling lookup_one_len() on the root
           dir which automatically creates them if they don't exist.
      
      [*] Inasmuch as network namespaces are currently supported here.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0da0b7fd
    • C
      block: remov blk_queue_invalidate_tags · be7f99c5
      Christoph Hellwig 提交于
      This function is entirely unused, so remove it and the tag_queue_busy
      member of struct request_queue.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      be7f99c5