1. 08 1月, 2018 1 次提交
  2. 06 1月, 2018 1 次提交
  3. 09 12月, 2017 1 次提交
  4. 08 12月, 2017 2 次提交
  5. 07 12月, 2017 1 次提交
  6. 06 12月, 2017 6 次提交
    • D
      drm: safely free connectors from connector_iter · a703c550
      Daniel Vetter 提交于
      In
      
      commit 613051da
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Dec 14 00:08:06 2016 +0100
      
          drm: locking&new iterators for connector_list
      
      we've went to extreme lengths to make sure connector iterations works
      in any context, without introducing any additional locking context.
      This worked, except for a small fumble in the implementation:
      
      When we actually race with a concurrent connector unplug event, and
      our temporary connector reference turns out to be the final one, then
      everything breaks: We call the connector release function from
      whatever context we happen to be in, which can be an irq/atomic
      context. And connector freeing grabs all kinds of locks and stuff.
      
      Fix this by creating a specially safe put function for connetor_iter,
      which (in this rare case) punts the cleanup to a worker.
      Reported-by: NBen Widawsky <ben@bwidawsk.net>
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Fixes: 613051da ("drm: locking&new iterators for connector_list")
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: <stable@vger.kernel.org> # v4.11+
      Reviewed-by: NDave Airlie <airlied@gmail.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171204204818.24745-1-daniel.vetter@ffwll.ch
      a703c550
    • C
      KVM: s390: mark irq_state.flags as non-usable · bb64da9a
      Christian Borntraeger 提交于
      Old kernels did not check for zero in the irq_state.flags field and old
      QEMUs did not zero the flag/reserved fields when calling
      KVM_S390_*_IRQ_STATE.  Let's add comments to prevent future uses of
      these fields.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      bb64da9a
    • E
      net: remove hlist_nulls_add_tail_rcu() · d7efc6c1
      Eric Dumazet 提交于
      Alexander Potapenko reported use of uninitialized memory [1]
      
      This happens when inserting a request socket into TCP ehash,
      in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.
      
      Bug was added by commit d894ba18 ("soreuseport: fix ordering for
      mixed v4/v6 sockets")
      
      Note that d296ba60 ("soreuseport: Resolve merge conflict for v4/v6
      ordering fix") missed the opportunity to get rid of
      hlist_nulls_add_tail_rcu() :
      
      Both UDP sockets and TCP/DCCP listeners no longer use
      __sk_nulls_add_node_rcu() for their hash insertion.
      
      Since all other sockets have unique 4-tuple, the reuseport status
      has no special meaning, so we can always use hlist_nulls_add_head_rcu()
      for them and save few cycles/instructions.
      
      [1]
      
      ==================================================================
      BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x185/0x1d0 lib/dump_stack.c:52
       kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
       __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
       __sk_nulls_add_node_rcu ./include/net/sock.h:684
       inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
       reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
       inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
       tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
       tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
       tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
       tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
       tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
       ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
       NF_HOOK ./include/linux/netfilter.h:248
       ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
       dst_input ./include/net/dst.h:477
       ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
       NF_HOOK ./include/linux/netfilter.h:248
       ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
       __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
       __netif_receive_skb net/core/dev.c:4336
       netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
       napi_skb_finish net/core/dev.c:4858
       napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
       e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
       e1000_clean_rx_irq+0x1492/0x1d30
      drivers/net/ethernet/intel/e1000/e1000_main.c:4474
       e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
       napi_poll net/core/dev.c:5500
       net_rx_action+0x73c/0x1820 net/core/dev.c:5566
       __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364
       irq_exit+0x203/0x240 kernel/softirq.c:405
       exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
       do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
       common_interrupt+0x86/0x86
      
      Fixes: d894ba18 ("soreuseport: fix ordering for mixed v4/v6 sockets")
      Fixes: d296ba60 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7efc6c1
    • R
      x86,kvm: move qemu/guest FPU switching out to vcpu_run · f775b13e
      Rik van Riel 提交于
      Currently, every time a VCPU is scheduled out, the host kernel will
      first save the guest FPU/xstate context, then load the qemu userspace
      FPU context, only to then immediately save the qemu userspace FPU
      context back to memory. When scheduling in a VCPU, the same extraneous
      FPU loads and saves are done.
      
      This could be avoided by moving from a model where the guest FPU is
      loaded and stored with preemption disabled, to a model where the
      qemu userspace FPU is swapped out for the guest FPU context for
      the duration of the KVM_RUN ioctl.
      
      This is done under the VCPU mutex, which is also taken when other
      tasks inspect the VCPU FPU context, so the code should already be
      safe for this change. That should come as no surprise, given that
      s390 already has this optimization.
      
      This can fix a bug where KVM calls get_user_pages while owning the
      FPU, and the file system ends up requesting the FPU again:
      
          [258270.527947]  __warn+0xcb/0xf0
          [258270.527948]  warn_slowpath_null+0x1d/0x20
          [258270.527951]  kernel_fpu_disable+0x3f/0x50
          [258270.527953]  __kernel_fpu_begin+0x49/0x100
          [258270.527955]  kernel_fpu_begin+0xe/0x10
          [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
          [258270.527961]  crypto_shash_update+0x3f/0x110
          [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
          [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
          [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
          [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
          [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
          [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
          [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
          [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
          [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
          [258270.528002]  shrink_slab.part.40+0x1f5/0x420
          [258270.528004]  shrink_node+0x22c/0x320
          [258270.528006]  do_try_to_free_pages+0xf5/0x330
          [258270.528008]  try_to_free_pages+0xe9/0x190
          [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
          [258270.528011]  __alloc_pages_nodemask+0x209/0x260
          [258270.528014]  alloc_pages_vma+0x1f1/0x250
          [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
          [258270.528021]  handle_mm_fault+0xfd3/0x1330
          [258270.528025]  __get_user_pages+0x113/0x640
          [258270.528027]  get_user_pages+0x4f/0x60
          [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
          [258270.528108]  try_async_pf+0x66/0x230 [kvm]
          [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
          [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
          [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
          [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
      
      No performance changes were detected in quick ping-pong tests on
      my 4 socket system, which is expected since an FPU+xstate load is
      on the order of 0.1us, while ping-ponging between CPUs is on the
      order of 20us, and somewhat noisy.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
       which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f775b13e
    • N
      net_sched: red: Avoid illegal values · 8afa10cb
      Nogah Frankel 提交于
      Check the qmin & qmax values doesn't overflow for the given Wlog value.
      Check that qmin <= qmax.
      
      Fixes: a7834745 ("[PKT_SCHED]: Generic RED layer")
      Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8afa10cb
    • N
      net_sched: red: Avoid devision by zero · 5c472203
      Nogah Frankel 提交于
      Do not allow delta value to be zero since it is used as a divisor.
      
      Fixes: 8af2a218 ("sch_red: Adaptative RED AQM")
      Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c472203
  7. 05 12月, 2017 2 次提交
    • H
      bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · c895f6f7
      Hendrik Brueckner 提交于
      Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
      program type") introduced the bpf_perf_event_data structure which
      exports the pt_regs structure.  This is OK for multiple architectures
      but fail for s390 and arm64 which do not export pt_regs.  Programs
      using them, for example, the bpf selftest fail to compile on these
      architectures.
      
      For s390, exporting the pt_regs is not an option because s390 wants
      to allow changes to it.  For arm64, there is a user_pt_regs structure
      that covers parts of the pt_regs structure for use by user space.
      
      To solve the broken uapi for s390 and arm64, introduce an abstract
      type for pt_regs and add an asm/bpf_perf_event.h file that concretes
      the type.  An asm-generic header file covers the architectures that
      export pt_regs today.
      
      The arch-specific enablement for s390 and arm64 follows in separate
      commits.
      Reported-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c895f6f7
    • W
      irqdesc: Use bool return type instead of int · 4ce413d1
      Will Deacon 提交于
      The irq_balancing_disabled and irq_is_percpu{,_devid} functions are
      clearly intended to return bool like the functions in
      kernel/irq/settings.h, but actually return an int containing a masked
      value of desc->status_use_accessors. This can lead to subtle breakage
      if, for example, the return value is subsequently truncated when
      assigned to a narrower type.
      
      As Linus points out:
      
      | In particular, what can (and _has_ happened) is that people end up
      | using these functions that return true or false, and they assign the
      | result to something like a bitfield (or a char) or whatever.
      |
      | And the code looks *obviously* correct, when you have things like
      |
      |      dev->percpu = irq_is_percpu_devid(dev->irq);
      |
      | and that "percpu" thing is just one status bit among many. It may even
      | *work*, because maybe that "percpu" flag ends up not being all that
      | important, or it just happens to never be set on the particular
      | hardware that people end up testing.
      |
      | But while it looks obviously correct, and might even work, it's really
      | fundamentally broken. Because that "true or false" function didn't
      | actually return 0/1, it returned 0 or 0x20000.
      |
      | And 0x20000 may not fit in a bitmask or a "char" or whatever.
      
      Fix the problem by consistently using bool as the return type for these
      functions.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: marc.zyngier@arm.com
      Link: https://lkml.kernel.org/r/1512142179-24616-1-git-send-email-will.deacon@arm.com
      4ce413d1
  8. 04 12月, 2017 1 次提交
  9. 02 12月, 2017 3 次提交
    • A
      iio: stm32: fix adc/trigger link error · 6d745ee8
      Arnd Bergmann 提交于
      The ADC driver can trigger on either the timer or the lptim
      trigger, but it only uses a Kconfig 'select' statement
      to ensure that the first of the two is present. When the lptim
      trigger is enabled as a loadable module, and the adc driver
      is built-in, we now get a link error:
      
      drivers/iio/adc/stm32-adc.o: In function `stm32_adc_get_trig_extsel':
      stm32-adc.c:(.text+0x4e0): undefined reference to `is_stm32_lptim_trigger'
      
      We could use a second 'select' statement and always have both
      trigger drivers enabled when the adc driver is, but it seems that
      the lptimer trigger was intentionally left optional, so it seems
      better to keep it that way.
      
      This adds a hack to use 'IS_REACHABLE()' rather than 'IS_ENABLED()',
      which avoids the link error, but instead leads to the lptimer trigger
      not being used in the broken configuration. I've added a runtime
      warning for this case to help users figure out what they did wrong
      if this should ever be done by accident.
      
      Fixes: f0b638a7 ("iio: adc: stm32: add support for lptimer triggers")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      6d745ee8
    • C
      move libgcc.h to include/linux · 4db2b604
      Christoph Hellwig 提交于
      Introducing a new include/lib directory just for this file totally
      messes up tab completion for include/linux, which is highly annoying.
      
      Move it to include/linux where we have headers for all kinds of other
      lib/ code as well.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      4db2b604
    • X
      sctp: abandon the whole msg if one part of a fragmented message is abandoned · e5f61296
      Xin Long 提交于
      As rfc3758#section-3.1 demands:
      
         A3) When a TSN is "abandoned", if it is part of a fragmented message,
             all other TSN's within that fragmented message MUST be abandoned
             at the same time.
      
      Besides, if it couldn't handle this, the rest frags would never get
      assembled in peer side.
      
      This patch supports it by adding abandoned flag in sctp_datamsg, when
      one chunk is being abandoned, set chunk->msg->abandoned as well. Next
      time when checking for abandoned, go checking chunk->msg->abandoned
      first.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5f61296
  10. 30 11月, 2017 11 次提交
    • C
      act_sample: get rid of tcf_sample_cleanup_rcu() · 90a6ec85
      Cong Wang 提交于
      Similar to commit d7fb60b9 ("net_sched: get rid of tcfa_rcu"),
      TC actions don't need to respect RCU grace period, because it
      is either just detached from tc filter (standalone case) or
      it is removed together with tc filter (bound case) in which case
      RCU grace period is already respected at filter layer.
      
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Yotam Gigi <yotamg@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90a6ec85
    • G
    • I
      autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored" · 5d38f049
      Ian Kent 提交于
      Commit 42f46148 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
      allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT
      flag but introduced a semantic change.
      
      In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
      negative dentry case for stat family system calls in follow_automount().
      
      This changed the unconditional triggering of an automount in this case
      to no longer be done and an error returned instead.
      
      This has caused more problems than I expected so reverting the change is
      needed.
      
      In a discussion with Neil Brown it was concluded that the automount(8)
      daemon can implement this change without kernel modifications.  So that
      will be done instead and the autofs module documentation updated with a
      description of the problem and what needs to be done by module users for
      this specific case.
      
      Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net
      Fixes: 42f46148 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Neil Brown <neilb@suse.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Colin Walters <walters@redhat.com>
      Cc: Ondrej Holy <oholy@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.11+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d38f049
    • Z
      mm: migrate: fix an incorrect call of prep_transhuge_page() · 40a899ed
      Zi Yan 提交于
      In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
      memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
      non-THP pages for migration, when THP is on but THP migration is not
      enabled.  This leads to a bad state of target pages for migration.
      
      By inspecting the code, if called on a non-THP, prep_transhuge_page()
      will
      
       1) change the value of the mapping of (page + 2), since it is used for
          THP deferred list;
      
       2) change the lru value of (page + 1), since it is used for THP's dtor.
      
      Both can lead to data corruption of these two pages.
      
      Andrea said:
       "Pragmatically and from the point of view of the memory_hotplug subsys,
        the effect is a kernel crash when pages are being migrated during a
        memory hot remove offline and migration target pages are found in a
        bad state"
      
      This patch fixes it by only calling prep_transhuge_page() when we are
      certain that the target page is THP.
      
      Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
      Fixes: 8135d892 ("mm: memory_hotplug: memory hotremove supports thp migration")
      Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
      Reported-by: NAndrea Reale <ar@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.14]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40a899ed
    • D
      mm: introduce get_user_pages_longterm · 2bb6d283
      Dan Williams 提交于
      Patch series "introduce get_user_pages_longterm()", v2.
      
      Here is a new get_user_pages api for cases where a driver intends to
      keep an elevated page count indefinitely.  This is distinct from usages
      like iov_iter_get_pages where the elevated page counts are transient.
      The iov_iter_get_pages cases immediately turn around and submit the
      pages to a device driver which will put_page when the i/o operation
      completes (under kernel control).
      
      In the longterm case userspace is responsible for dropping the page
      reference at some undefined point in the future.  This is untenable for
      filesystem-dax case where the filesystem is in control of the lifetime
      of the block / page and needs reasonable limits on how long it can wait
      for pages in a mapping to become idle.
      
      Fixing filesystems to actually wait for dax pages to be idle before
      blocks from a truncate/hole-punch operation are repurposed is saved for
      a later patch series.
      
      Also, allowing longterm registration of dax mappings is a future patch
      series that introduces a "map with lease" semantic where the kernel can
      revoke a lease and force userspace to drop its page references.
      
      I have also tagged these for -stable to purposely break cases that might
      assume that longterm memory registrations for filesystem-dax mappings
      were supported by the kernel.  The behavior regression this policy
      change implies is one of the reasons we maintain the "dax enabled.
      Warning: EXPERIMENTAL, use at your own risk" notification when mounting
      a filesystem in dax mode.
      
      It is worth noting the device-dax interface does not suffer the same
      constraints since it does not support file space management operations
      like hole-punch.
      
      This patch (of 4):
      
      Until there is a solution to the dma-to-dax vs truncate problem it is
      not safe to allow long standing memory registrations against
      filesytem-dax vmas.  Device-dax vmas do not have this problem and are
      explicitly allowed.
      
      This is temporary until a "memory registration with layout-lease"
      mechanism can be implemented for the affected sub-systems (RDMA and
      V4L2).
      
      [akpm@linux-foundation.org: use kcalloc()]
      Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Joonyoung Shim <jy0922.shim@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bb6d283
    • D
      mm, hugetlbfs: introduce ->split() to vm_operations_struct · 31383c68
      Dan Williams 提交于
      Patch series "device-dax: fix unaligned munmap handling"
      
      When device-dax is operating in huge-page mode we want it to behave like
      hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
      would be messy to teach the munmap path about device-dax alignment
      constraints in the same (hstate) way that hugetlbfs communicates this
      constraint.  Instead, these patches introduce a new ->split() vm
      operation.
      
      This patch (of 2):
      
      The device-dax interface has similar constraints as hugetlbfs in that it
      requires the munmap path to unmap in huge page aligned units.  Rather
      than add more custom vma handling code in __split_vma() introduce a new
      vm operation to perform this vma specific check.
      
      Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31383c68
    • D
      mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE · e4e40e02
      Dan Williams 提交于
      In response to compile breakage introduced by a series that added the
      pud_write helper to x86, Stephen notes:
      
          did you consider using the other paradigm:
      
          In arch include files:
          #define pud_write       pud_write
          static inline int pud_write(pud_t pud)
           .....
      
          Then in include/asm-generic/pgtable.h:
      
          #ifndef pud_write
          tatic inline int pud_write(pud_t pud)
          {
                  ....
          }
          #endif
      
          If you had, then the powerpc code would have worked ... ;-) and many
          of the other interfaces in include/asm-generic/pgtable.h are
          protected that way ...
      
      Given that some architecture already define pmd_write() as a macro, it's
      a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.
      
      Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Suggested-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4e40e02
    • D
      mm: fix device-dax pud write-faults triggered by get_user_pages() · 1501899a
      Dan Williams 提交于
      Currently only get_user_pages_fast() can safely handle the writable gup
      case due to its use of pud_access_permitted() to check whether the pud
      entry is writable.  In the gup slow path pud_write() is used instead of
      pud_access_permitted() and to date it has been unimplemented, just calls
      BUG_ON().
      
          kernel BUG at ./include/linux/hugetlb.h:244!
          [..]
          RIP: 0010:follow_devmap_pud+0x482/0x490
          [..]
          Call Trace:
           follow_page_mask+0x28c/0x6e0
           __get_user_pages+0xe4/0x6c0
           get_user_pages_unlocked+0x130/0x1b0
           get_user_pages_fast+0x89/0xb0
           iov_iter_get_pages_alloc+0x114/0x4a0
           nfs_direct_read_schedule_iovec+0xd2/0x350
           ? nfs_start_io_direct+0x63/0x70
           nfs_file_direct_read+0x1e0/0x250
           nfs_file_read+0x90/0xc0
      
      For now this just implements a simple check for the _PAGE_RW bit similar
      to pmd_write.  However, this implies that the gup-slow-path check is
      missing the extra checks that the gup-fast-path performs with
      pud_access_permitted.  Later patches will align all checks to use the
      'access_permitted' helper if the architecture provides it.
      
      Note that the generic 'access_permitted' helper fallback is the simple
      _PAGE_RW check on architectures that do not define the
      'access_permitted' helper(s).
      
      [dan.j.williams@intel.com: fix powerpc compile error]
        Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: a00cc7d9 ("mm, x86: add support for PUD-sized transparent hugepages")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: Thomas Gleixner <tglx@linutronix.de>	[x86]
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1501899a
    • X
      trace/xdp: fix compile warning: 'struct bpf_map' declared inside parameter list · 23721a75
      Xie XiuQi 提交于
      We meet this compile warning, which caused by missing bpf.h in xdp.h.
      
      In file included from ./include/trace/events/xdp.h:10:0,
                       from ./include/linux/bpf_trace.h:6,
                       from drivers/net/ethernet/intel/i40e/i40e_txrx.c:29:
      ./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration
          const struct bpf_map *map, u32 map_index),
                       ^
      ./include/linux/tracepoint.h:187:34: note: in definition of macro ‘__DECLARE_TRACE’
        static inline void trace_##name(proto)    \
                                        ^~~~~
      ./include/linux/tracepoint.h:352:24: note: in expansion of macro ‘PARAMS’
        __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
                              ^~~~~~
      ./include/linux/tracepoint.h:477:2: note: in expansion of macro ‘DECLARE_TRACE’
        DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
        ^~~~~~~~~~~~~
      ./include/linux/tracepoint.h:477:22: note: in expansion of macro ‘PARAMS’
        DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
                            ^~~~~~
      ./include/trace/events/xdp.h:89:1: note: in expansion of macro ‘DEFINE_EVENT’
       DEFINE_EVENT(xdp_redirect_template, xdp_redirect,
       ^~~~~~~~~~~~
      ./include/trace/events/xdp.h:90:2: note: in expansion of macro ‘TP_PROTO’
        TP_PROTO(const struct net_device *dev,
        ^~~~~~~~
      ./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration
          const struct bpf_map *map, u32 map_index),
                       ^
      ./include/linux/tracepoint.h:203:38: note: in definition of macro ‘__DECLARE_TRACE’
        register_trace_##name(void (*probe)(data_proto), void *data) \
                                            ^~~~~~~~~~
      ./include/linux/tracepoint.h:354:4: note: in expansion of macro ‘PARAMS’
          PARAMS(void *__data, proto),   \
          ^~~~~~
      Reported-by: NHuang Daode <huangdaode@hisilicon.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Fixes: 8d3b778f ("xdp: tracepoint xdp_redirect also need a map argument")
      Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      23721a75
    • C
      drm/ttm: fix populate_and_map() functions once more · 1569d651
      Christian König 提交于
      This reverts "drm/ttm: Fix configuration error around populate_and_map()
      functions".
      
      This fix has gone into the wrong direction. Those helpers should be
      available even when neither CONFIG_INTEL_IOMMU nor CONFIG_SWIOTLB are
      set.
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: NMichel Dänzer <michel.daenzer@amd.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      1569d651
    • L
      kallsyms: take advantage of the new '%px' format · 668533dc
      Linus Torvalds 提交于
      The conditional kallsym hex printing used a special fixed-width '%lx'
      output (KALLSYM_FMT) in preparation for the hashing of %p, but that
      series ended up adding a %px specifier to help with the conversions.
      
      Use it, and avoid the "print pointer as an unsigned long" code.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      668533dc
  11. 29 11月, 2017 3 次提交
  12. 28 11月, 2017 7 次提交
    • K
      Drivers: hv: vmbus: Fix a rescind issue · 7fa32e5e
      K. Y. Srinivasan 提交于
      The current rescind processing code will not correctly handle
      the case where the host immediately rescinds a channel that has
      been offerred. In this case, we could be blocked in the open call and
      since the channel is rescinded, the host will not respond and we could
      be blocked forever in the vmbus open call.i Fix this problem.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fa32e5e
    • J
      serdev: fix receive_buf return value when no callback · fd00cf81
      Johan Hovold 提交于
      The receive_buf callback is supposed to return the number of bytes
      processed and should specifically not return a negative errno.
      
      Due to missing sanity checks in the serdev tty-port controller, a driver
      not providing a receive_buf callback could cause the flush_to_ldisc()
      worker to spin in a tight loop when the tty buffer pointers are
      incremented with -EINVAL (-22).
      
      The missing sanity checks have now been added to the tty-port
      controller, but let's fix up the serdev-controller helper as well.
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd00cf81
    • A
      debugfs: fix debugfs_real_fops() build error · f50caa9b
      Arnd Bergmann 提交于
      Some drivers use debugfs_real_fops() even when CONFIG_DEBUG_FS is disabled,
      which now leads to a build error:
      
      In file included from include/linux/list.h:9:0,
                       from include/linux/wait.h:7,
                       from include/linux/wait_bit.h:8,
                       from include/linux/fs.h:6,
                       from drivers/net/wireless/broadcom/b43legacy/debugfs.c:26:
      drivers/net/wireless/broadcom/b43legacy/debugfs.c: In function 'b43legacy_debugfs_read':
      drivers/net/wireless/broadcom/b43legacy/debugfs.c:224:23: error: implicit declaration of function 'debugfs_real_fops'; did you mean 'debugfs_create_bool'? [-Werror=implicit-function-declaration]
      
      My first impulse was to add another 'static inline' dummy function
      returning NULL for it, which would work fine. However, most callers
      feed the pointer into container_of(), so it seems a little dangerous
      here. Since all the callers are inside of a read/write file operation
      that gets eliminated in this configuration, so having an 'extern'
      declaration seems better here. If it ever gets used in a dangerous
      way, that will now result in a link error.
      
      Fixes: 7c8d4698 ("debugfs: add support for more elaborate ->d_fsdata")
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f50caa9b
    • M
      USB: core: Add type-specific length check of BOS descriptors · 81cf4a45
      Masakazu Mokuno 提交于
      As most of BOS descriptors are longer in length than their header
      'struct usb_dev_cap_header', comparing solely with it is not sufficient
      to avoid out-of-bounds access to BOS descriptors.
      
      This patch adds descriptor type specific length check in
      usb_get_bos_descriptor() to fix the issue.
      Signed-off-by: NMasakazu Mokuno <masakazu.mokuno@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81cf4a45
    • B
      sunrpc: make the function arg as const · d34971a6
      Bhumika Goyal 提交于
      Make the struct cache_detail *tmpl argument of the function
      cache_create_net as const as it is only getting passed to kmemup having
      the argument as const void *.
      Add const to the prototype too.
      Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d34971a6
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
    • J
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c
      Jan H. Schönherr 提交于
      KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
      "any unblocked signal received [...] will cause KVM_RUN to return with
      -EINTR" and that "the signal will only be delivered if not blocked by
      the original signal mask".
      
      This, however, is only true, when the calling task has a signal handler
      registered for a signal. If not, signal evaluation is short-circuited for
      SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
      returning or the whole process is terminated.
      
      Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
      to that in do_sigtimedwait() to avoid short-circuiting of signals.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20b7035c
  13. 27 11月, 2017 1 次提交