1. 17 12月, 2017 3 次提交
  2. 15 12月, 2017 6 次提交
  3. 12 12月, 2017 3 次提交
    • M
      compiler.h: Remove ACCESS_ONCE() · b899a850
      Mark Rutland 提交于
      There are no longer any kernelspace uses of ACCESS_ONCE(), so we can
      remove the definition from <linux/compiler.h>.
      
      This patch removes the ACCESS_ONCE() definition, and updates comments
      which referred to it. At the same time, some inconsistent and redundant
      whitespace is removed from comments.
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: apw@canonical.com
      Link: http://lkml.kernel.org/r/20171127103824.36526-4-mark.rutland@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b899a850
    • I
      locking/lockdep: Remove the cross-release locking checks · e966eaee
      Ingo Molnar 提交于
      This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
      while it found a number of old bugs initially, was also causing too many
      false positives that caused people to disable lockdep - which is arguably
      a worse overall outcome.
      
      If we disable cross-release by default but keep the code upstream then
      in practice the most likely outcome is that we'll allow the situation
      to degrade gradually, by allowing entropy to introduce more and more
      false positives, until it overwhelms maintenance capacity.
      
      Another bad side effect was that people were trying to work around
      the false positives by uglifying/complicating unrelated code. There's
      a marked difference between annotating locking operations and
      uglifying good code just due to bad lock debugging code ...
      
      This gradual decrease in quality happened to a number of debugging
      facilities in the kernel, and lockdep is pretty complex already,
      so we cannot risk this outcome.
      
      Either cross-release checking can be done right with no false positives,
      or it should not be included in the upstream kernel.
      
      ( Note that it might make sense to maintain it out of tree and go through
        the false positives every now and then and see whether new bugs were
        introduced. )
      
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e966eaee
    • W
      locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y · d89c7035
      Will Deacon 提交于
      When CONFIG_GENERIC_LOCKBEAK=y, locking structures grow an extra int ->break_lock
      field which is used to implement raw_spin_is_contended() by setting the field
      to 1 when waiting on a lock and clearing it to zero when holding a lock.
      However, there are a few problems with this approach:
      
        - There is a write-write race between a CPU successfully taking the lock
          (and subsequently writing break_lock = 0) and a waiter waiting on
          the lock (and subsequently writing break_lock = 1). This could result
          in a contended lock being reported as uncontended and vice-versa.
      
        - On machines with store buffers, nothing guarantees that the writes
          to break_lock are visible to other CPUs at any particular time.
      
        - READ_ONCE/WRITE_ONCE are not used, so the field is potentially
          susceptible to harmful compiler optimisations,
      
      Consequently, the usefulness of this field is unclear and we'd be better off
      removing it and allowing architectures to implement raw_spin_is_contended() by
      providing a definition of arch_spin_is_contended(), as they can when
      CONFIG_GENERIC_LOCKBREAK=n.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1511894539-7988-3-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d89c7035
  4. 11 12月, 2017 2 次提交
  5. 09 12月, 2017 1 次提交
  6. 08 12月, 2017 2 次提交
  7. 07 12月, 2017 2 次提交
  8. 06 12月, 2017 2 次提交
    • E
      net: remove hlist_nulls_add_tail_rcu() · d7efc6c1
      Eric Dumazet 提交于
      Alexander Potapenko reported use of uninitialized memory [1]
      
      This happens when inserting a request socket into TCP ehash,
      in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.
      
      Bug was added by commit d894ba18 ("soreuseport: fix ordering for
      mixed v4/v6 sockets")
      
      Note that d296ba60 ("soreuseport: Resolve merge conflict for v4/v6
      ordering fix") missed the opportunity to get rid of
      hlist_nulls_add_tail_rcu() :
      
      Both UDP sockets and TCP/DCCP listeners no longer use
      __sk_nulls_add_node_rcu() for their hash insertion.
      
      Since all other sockets have unique 4-tuple, the reuseport status
      has no special meaning, so we can always use hlist_nulls_add_head_rcu()
      for them and save few cycles/instructions.
      
      [1]
      
      ==================================================================
      BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x185/0x1d0 lib/dump_stack.c:52
       kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
       __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
       __sk_nulls_add_node_rcu ./include/net/sock.h:684
       inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
       reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
       inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
       tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
       tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
       tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
       tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
       tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
       ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
       NF_HOOK ./include/linux/netfilter.h:248
       ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
       dst_input ./include/net/dst.h:477
       ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
       NF_HOOK ./include/linux/netfilter.h:248
       ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
       __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
       __netif_receive_skb net/core/dev.c:4336
       netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
       napi_skb_finish net/core/dev.c:4858
       napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
       e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
       e1000_clean_rx_irq+0x1492/0x1d30
      drivers/net/ethernet/intel/e1000/e1000_main.c:4474
       e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
       napi_poll net/core/dev.c:5500
       net_rx_action+0x73c/0x1820 net/core/dev.c:5566
       __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364
       irq_exit+0x203/0x240 kernel/softirq.c:405
       exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
       do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
       common_interrupt+0x86/0x86
      
      Fixes: d894ba18 ("soreuseport: fix ordering for mixed v4/v6 sockets")
      Fixes: d296ba60 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7efc6c1
    • R
      x86,kvm: move qemu/guest FPU switching out to vcpu_run · f775b13e
      Rik van Riel 提交于
      Currently, every time a VCPU is scheduled out, the host kernel will
      first save the guest FPU/xstate context, then load the qemu userspace
      FPU context, only to then immediately save the qemu userspace FPU
      context back to memory. When scheduling in a VCPU, the same extraneous
      FPU loads and saves are done.
      
      This could be avoided by moving from a model where the guest FPU is
      loaded and stored with preemption disabled, to a model where the
      qemu userspace FPU is swapped out for the guest FPU context for
      the duration of the KVM_RUN ioctl.
      
      This is done under the VCPU mutex, which is also taken when other
      tasks inspect the VCPU FPU context, so the code should already be
      safe for this change. That should come as no surprise, given that
      s390 already has this optimization.
      
      This can fix a bug where KVM calls get_user_pages while owning the
      FPU, and the file system ends up requesting the FPU again:
      
          [258270.527947]  __warn+0xcb/0xf0
          [258270.527948]  warn_slowpath_null+0x1d/0x20
          [258270.527951]  kernel_fpu_disable+0x3f/0x50
          [258270.527953]  __kernel_fpu_begin+0x49/0x100
          [258270.527955]  kernel_fpu_begin+0xe/0x10
          [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
          [258270.527961]  crypto_shash_update+0x3f/0x110
          [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
          [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
          [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
          [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
          [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
          [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
          [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
          [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
          [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
          [258270.528002]  shrink_slab.part.40+0x1f5/0x420
          [258270.528004]  shrink_node+0x22c/0x320
          [258270.528006]  do_try_to_free_pages+0xf5/0x330
          [258270.528008]  try_to_free_pages+0xe9/0x190
          [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
          [258270.528011]  __alloc_pages_nodemask+0x209/0x260
          [258270.528014]  alloc_pages_vma+0x1f1/0x250
          [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
          [258270.528021]  handle_mm_fault+0xfd3/0x1330
          [258270.528025]  __get_user_pages+0x113/0x640
          [258270.528027]  get_user_pages+0x4f/0x60
          [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
          [258270.528108]  try_async_pf+0x66/0x230 [kvm]
          [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
          [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
          [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
          [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
      
      No performance changes were detected in quick ping-pong tests on
      my 4 socket system, which is expected since an FPU+xstate load is
      on the order of 0.1us, while ping-ponging between CPUs is on the
      order of 20us, and somewhat noisy.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
       which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f775b13e
  9. 05 12月, 2017 3 次提交
    • H
      bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · c895f6f7
      Hendrik Brueckner 提交于
      Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
      program type") introduced the bpf_perf_event_data structure which
      exports the pt_regs structure.  This is OK for multiple architectures
      but fail for s390 and arm64 which do not export pt_regs.  Programs
      using them, for example, the bpf selftest fail to compile on these
      architectures.
      
      For s390, exporting the pt_regs is not an option because s390 wants
      to allow changes to it.  For arm64, there is a user_pt_regs structure
      that covers parts of the pt_regs structure for use by user space.
      
      To solve the broken uapi for s390 and arm64, introduce an abstract
      type for pt_regs and add an asm/bpf_perf_event.h file that concretes
      the type.  An asm-generic header file covers the architectures that
      export pt_regs today.
      
      The arch-specific enablement for s390 and arm64 follows in separate
      commits.
      Reported-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c895f6f7
    • T
      Revert "cpuset: Make cpuset hotplug synchronous" · 11db855c
      Tejun Heo 提交于
      This reverts commit 1599a185.
      
      This and the previous commit led to another circular locking scenario
      and the scenario which is fixed by this commit no longer exists after
      e8b3f8db ("workqueue/hotplug: simplify workqueue_offline_cpu()")
      which removes work item flushing from hotplug path.
      
      Revert it for now.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      11db855c
    • W
      irqdesc: Use bool return type instead of int · 4ce413d1
      Will Deacon 提交于
      The irq_balancing_disabled and irq_is_percpu{,_devid} functions are
      clearly intended to return bool like the functions in
      kernel/irq/settings.h, but actually return an int containing a masked
      value of desc->status_use_accessors. This can lead to subtle breakage
      if, for example, the return value is subsequently truncated when
      assigned to a narrower type.
      
      As Linus points out:
      
      | In particular, what can (and _has_ happened) is that people end up
      | using these functions that return true or false, and they assign the
      | result to something like a bitfield (or a char) or whatever.
      |
      | And the code looks *obviously* correct, when you have things like
      |
      |      dev->percpu = irq_is_percpu_devid(dev->irq);
      |
      | and that "percpu" thing is just one status bit among many. It may even
      | *work*, because maybe that "percpu" flag ends up not being all that
      | important, or it just happens to never be set on the particular
      | hardware that people end up testing.
      |
      | But while it looks obviously correct, and might even work, it's really
      | fundamentally broken. Because that "true or false" function didn't
      | actually return 0/1, it returned 0 or 0x20000.
      |
      | And 0x20000 may not fit in a bitmask or a "char" or whatever.
      
      Fix the problem by consistently using bool as the return type for these
      functions.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: marc.zyngier@arm.com
      Link: https://lkml.kernel.org/r/1512142179-24616-1-git-send-email-will.deacon@arm.com
      4ce413d1
  10. 04 12月, 2017 1 次提交
  11. 02 12月, 2017 2 次提交
    • A
      iio: stm32: fix adc/trigger link error · 6d745ee8
      Arnd Bergmann 提交于
      The ADC driver can trigger on either the timer or the lptim
      trigger, but it only uses a Kconfig 'select' statement
      to ensure that the first of the two is present. When the lptim
      trigger is enabled as a loadable module, and the adc driver
      is built-in, we now get a link error:
      
      drivers/iio/adc/stm32-adc.o: In function `stm32_adc_get_trig_extsel':
      stm32-adc.c:(.text+0x4e0): undefined reference to `is_stm32_lptim_trigger'
      
      We could use a second 'select' statement and always have both
      trigger drivers enabled when the adc driver is, but it seems that
      the lptimer trigger was intentionally left optional, so it seems
      better to keep it that way.
      
      This adds a hack to use 'IS_REACHABLE()' rather than 'IS_ENABLED()',
      which avoids the link error, but instead leads to the lptimer trigger
      not being used in the broken configuration. I've added a runtime
      warning for this case to help users figure out what they did wrong
      if this should ever be done by accident.
      
      Fixes: f0b638a7 ("iio: adc: stm32: add support for lptimer triggers")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      6d745ee8
    • C
      move libgcc.h to include/linux · 4db2b604
      Christoph Hellwig 提交于
      Introducing a new include/lib directory just for this file totally
      messes up tab completion for include/linux, which is highly annoying.
      
      Move it to include/linux where we have headers for all kinds of other
      lib/ code as well.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      4db2b604
  12. 30 11月, 2017 7 次提交
    • G
    • I
      autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored" · 5d38f049
      Ian Kent 提交于
      Commit 42f46148 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
      allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT
      flag but introduced a semantic change.
      
      In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
      negative dentry case for stat family system calls in follow_automount().
      
      This changed the unconditional triggering of an automount in this case
      to no longer be done and an error returned instead.
      
      This has caused more problems than I expected so reverting the change is
      needed.
      
      In a discussion with Neil Brown it was concluded that the automount(8)
      daemon can implement this change without kernel modifications.  So that
      will be done instead and the autofs module documentation updated with a
      description of the problem and what needs to be done by module users for
      this specific case.
      
      Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net
      Fixes: 42f46148 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Neil Brown <neilb@suse.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Colin Walters <walters@redhat.com>
      Cc: Ondrej Holy <oholy@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.11+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d38f049
    • Z
      mm: migrate: fix an incorrect call of prep_transhuge_page() · 40a899ed
      Zi Yan 提交于
      In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
      memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
      non-THP pages for migration, when THP is on but THP migration is not
      enabled.  This leads to a bad state of target pages for migration.
      
      By inspecting the code, if called on a non-THP, prep_transhuge_page()
      will
      
       1) change the value of the mapping of (page + 2), since it is used for
          THP deferred list;
      
       2) change the lru value of (page + 1), since it is used for THP's dtor.
      
      Both can lead to data corruption of these two pages.
      
      Andrea said:
       "Pragmatically and from the point of view of the memory_hotplug subsys,
        the effect is a kernel crash when pages are being migrated during a
        memory hot remove offline and migration target pages are found in a
        bad state"
      
      This patch fixes it by only calling prep_transhuge_page() when we are
      certain that the target page is THP.
      
      Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
      Fixes: 8135d892 ("mm: memory_hotplug: memory hotremove supports thp migration")
      Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
      Reported-by: NAndrea Reale <ar@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.14]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40a899ed
    • D
      mm: introduce get_user_pages_longterm · 2bb6d283
      Dan Williams 提交于
      Patch series "introduce get_user_pages_longterm()", v2.
      
      Here is a new get_user_pages api for cases where a driver intends to
      keep an elevated page count indefinitely.  This is distinct from usages
      like iov_iter_get_pages where the elevated page counts are transient.
      The iov_iter_get_pages cases immediately turn around and submit the
      pages to a device driver which will put_page when the i/o operation
      completes (under kernel control).
      
      In the longterm case userspace is responsible for dropping the page
      reference at some undefined point in the future.  This is untenable for
      filesystem-dax case where the filesystem is in control of the lifetime
      of the block / page and needs reasonable limits on how long it can wait
      for pages in a mapping to become idle.
      
      Fixing filesystems to actually wait for dax pages to be idle before
      blocks from a truncate/hole-punch operation are repurposed is saved for
      a later patch series.
      
      Also, allowing longterm registration of dax mappings is a future patch
      series that introduces a "map with lease" semantic where the kernel can
      revoke a lease and force userspace to drop its page references.
      
      I have also tagged these for -stable to purposely break cases that might
      assume that longterm memory registrations for filesystem-dax mappings
      were supported by the kernel.  The behavior regression this policy
      change implies is one of the reasons we maintain the "dax enabled.
      Warning: EXPERIMENTAL, use at your own risk" notification when mounting
      a filesystem in dax mode.
      
      It is worth noting the device-dax interface does not suffer the same
      constraints since it does not support file space management operations
      like hole-punch.
      
      This patch (of 4):
      
      Until there is a solution to the dma-to-dax vs truncate problem it is
      not safe to allow long standing memory registrations against
      filesytem-dax vmas.  Device-dax vmas do not have this problem and are
      explicitly allowed.
      
      This is temporary until a "memory registration with layout-lease"
      mechanism can be implemented for the affected sub-systems (RDMA and
      V4L2).
      
      [akpm@linux-foundation.org: use kcalloc()]
      Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Joonyoung Shim <jy0922.shim@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bb6d283
    • D
      mm, hugetlbfs: introduce ->split() to vm_operations_struct · 31383c68
      Dan Williams 提交于
      Patch series "device-dax: fix unaligned munmap handling"
      
      When device-dax is operating in huge-page mode we want it to behave like
      hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
      would be messy to teach the munmap path about device-dax alignment
      constraints in the same (hstate) way that hugetlbfs communicates this
      constraint.  Instead, these patches introduce a new ->split() vm
      operation.
      
      This patch (of 2):
      
      The device-dax interface has similar constraints as hugetlbfs in that it
      requires the munmap path to unmap in huge page aligned units.  Rather
      than add more custom vma handling code in __split_vma() introduce a new
      vm operation to perform this vma specific check.
      
      Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31383c68
    • D
      mm: fix device-dax pud write-faults triggered by get_user_pages() · 1501899a
      Dan Williams 提交于
      Currently only get_user_pages_fast() can safely handle the writable gup
      case due to its use of pud_access_permitted() to check whether the pud
      entry is writable.  In the gup slow path pud_write() is used instead of
      pud_access_permitted() and to date it has been unimplemented, just calls
      BUG_ON().
      
          kernel BUG at ./include/linux/hugetlb.h:244!
          [..]
          RIP: 0010:follow_devmap_pud+0x482/0x490
          [..]
          Call Trace:
           follow_page_mask+0x28c/0x6e0
           __get_user_pages+0xe4/0x6c0
           get_user_pages_unlocked+0x130/0x1b0
           get_user_pages_fast+0x89/0xb0
           iov_iter_get_pages_alloc+0x114/0x4a0
           nfs_direct_read_schedule_iovec+0xd2/0x350
           ? nfs_start_io_direct+0x63/0x70
           nfs_file_direct_read+0x1e0/0x250
           nfs_file_read+0x90/0xc0
      
      For now this just implements a simple check for the _PAGE_RW bit similar
      to pmd_write.  However, this implies that the gup-slow-path check is
      missing the extra checks that the gup-fast-path performs with
      pud_access_permitted.  Later patches will align all checks to use the
      'access_permitted' helper if the architecture provides it.
      
      Note that the generic 'access_permitted' helper fallback is the simple
      _PAGE_RW check on architectures that do not define the
      'access_permitted' helper(s).
      
      [dan.j.williams@intel.com: fix powerpc compile error]
        Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: a00cc7d9 ("mm, x86: add support for PUD-sized transparent hugepages")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: Thomas Gleixner <tglx@linutronix.de>	[x86]
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1501899a
    • L
      kallsyms: take advantage of the new '%px' format · 668533dc
      Linus Torvalds 提交于
      The conditional kallsym hex printing used a special fixed-width '%lx'
      output (KALLSYM_FMT) in preparation for the hashing of %p, but that
      series ended up adding a %px specifier to help with the conversions.
      
      Use it, and avoid the "print pointer as an unsigned long" code.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      668533dc
  13. 28 11月, 2017 6 次提交
    • K
      Drivers: hv: vmbus: Fix a rescind issue · 7fa32e5e
      K. Y. Srinivasan 提交于
      The current rescind processing code will not correctly handle
      the case where the host immediately rescinds a channel that has
      been offerred. In this case, we could be blocked in the open call and
      since the channel is rescinded, the host will not respond and we could
      be blocked forever in the vmbus open call.i Fix this problem.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fa32e5e
    • J
      serdev: fix receive_buf return value when no callback · fd00cf81
      Johan Hovold 提交于
      The receive_buf callback is supposed to return the number of bytes
      processed and should specifically not return a negative errno.
      
      Due to missing sanity checks in the serdev tty-port controller, a driver
      not providing a receive_buf callback could cause the flush_to_ldisc()
      worker to spin in a tight loop when the tty buffer pointers are
      incremented with -EINVAL (-22).
      
      The missing sanity checks have now been added to the tty-port
      controller, but let's fix up the serdev-controller helper as well.
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd00cf81
    • A
      debugfs: fix debugfs_real_fops() build error · f50caa9b
      Arnd Bergmann 提交于
      Some drivers use debugfs_real_fops() even when CONFIG_DEBUG_FS is disabled,
      which now leads to a build error:
      
      In file included from include/linux/list.h:9:0,
                       from include/linux/wait.h:7,
                       from include/linux/wait_bit.h:8,
                       from include/linux/fs.h:6,
                       from drivers/net/wireless/broadcom/b43legacy/debugfs.c:26:
      drivers/net/wireless/broadcom/b43legacy/debugfs.c: In function 'b43legacy_debugfs_read':
      drivers/net/wireless/broadcom/b43legacy/debugfs.c:224:23: error: implicit declaration of function 'debugfs_real_fops'; did you mean 'debugfs_create_bool'? [-Werror=implicit-function-declaration]
      
      My first impulse was to add another 'static inline' dummy function
      returning NULL for it, which would work fine. However, most callers
      feed the pointer into container_of(), so it seems a little dangerous
      here. Since all the callers are inside of a read/write file operation
      that gets eliminated in this configuration, so having an 'extern'
      declaration seems better here. If it ever gets used in a dangerous
      way, that will now result in a link error.
      
      Fixes: 7c8d4698 ("debugfs: add support for more elaborate ->d_fsdata")
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f50caa9b
    • B
      sunrpc: make the function arg as const · d34971a6
      Bhumika Goyal 提交于
      Make the struct cache_detail *tmpl argument of the function
      cache_create_net as const as it is only getting passed to kmemup having
      the argument as const void *.
      Add const to the prototype too.
      Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d34971a6
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
    • J
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c
      Jan H. Schönherr 提交于
      KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
      "any unblocked signal received [...] will cause KVM_RUN to return with
      -EINTR" and that "the signal will only be delivered if not blocked by
      the original signal mask".
      
      This, however, is only true, when the calling task has a signal handler
      registered for a signal. If not, signal evaluation is short-circuited for
      SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
      returning or the whole process is terminated.
      
      Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
      to that in do_sigtimedwait() to avoid short-circuiting of signals.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20b7035c