1. 30 4月, 2019 2 次提交
  2. 20 4月, 2019 1 次提交
    • M
      powerpc: Add force enable of DAWR on P9 option · c1fe190c
      Michael Neuling 提交于
      This adds a flag so that the DAWR can be enabled on P9 via:
        echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous
      
      The DAWR was previously force disabled on POWER9 in:
        96541531 powerpc: Disable DAWR in the base POWER9 CPU features
      Also see Documentation/powerpc/DAWR-POWER9.txt
      
      This is a dangerous setting, USE AT YOUR OWN RISK.
      
      Some users may not care about a bad user crashing their box
      (ie. single user/desktop systems) and really want the DAWR.  This
      allows them to force enable DAWR.
      
      This flag can also be used to disable DAWR access. Once this is
      cleared, all DAWR access should be cleared immediately and your
      machine once again safe from crashing.
      
      Userspace may get confused by toggling this. If DAWR is force
      enabled/disabled between getting the number of breakpoints (via
      PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
      inconsistent view of what's available. Similarly for guests.
      
      For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
      enabled in the host AND the guest. For this reason, this won't work on
      POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
      dawr_enable_dangerous file will fail if the hypervisor doesn't support
      writing the DAWR.
      
      To double check the DAWR is working, run this kernel selftest:
        tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
      Any errors/failures/skips mean something is wrong.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1fe190c
  3. 29 3月, 2019 5 次提交
    • P
      Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION · e2788c4a
      Paolo Bonzini 提交于
      The documentation does not mention how to delete a slot, add the
      information.
      Reported-by: NNathaniel McCallum <npmccallum@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e2788c4a
    • S
      KVM: doc: Document the life cycle of a VM and its resources · 919f6cd8
      Sean Christopherson 提交于
      The series to add memcg accounting to KVM allocations[1] states:
      
        There are many KVM kernel memory allocations which are tied to the
        life of the VM process and should be charged to the VM process's
        cgroup.
      
      While it is correct to account KVM kernel allocations to the cgroup of
      the process that created the VM, it's technically incorrect to state
      that the KVM kernel memory allocations are tied to the life of the VM
      process.  This is because the VM itself, i.e. struct kvm, is not tied to
      the life of the process which created it, rather it is tied to the life
      of its associated file descriptor.  In other words, kvm_destroy_vm() is
      not invoked until fput() decrements its associated file's refcount to
      zero.  A simple example is to fork() in Qemu and have the child sleep
      indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
      descriptor *and* the rogue child is killed.
      
      The allocations are guaranteed to be *accounted* to the process which
      created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
      ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
      the VM "alive" but can't do anything useful with its reference.
      
      Note that because 'struct kvm' also holds a reference to the mm_struct
      of its owner, the above behavior also applies to userspace allocations.
      
      Given that mucking with a VM's file descriptor can lead to subtle and
      undesirable behavior, e.g. memcg charges persisting after a VM is shut
      down, explicitly document a VM's lifecycle and its impact on the VM's
      resources.
      
      Alternatively, KVM could aggressively free resources when the creating
      process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
      isn't guaranteed to be available, and freeing resources when the creator
      exits is likely to be error prone and fragile as KVM would need to
      ensure that it only freed resources that are truly out of reach. In
      practice, the existing behavior shouldn't be problematic as a properly
      configured system will prevent a child process from being moved out of
      the appropriate cgroup hierarchy, i.e. prevent hiding the process from
      the OOM killer, and will prevent an unprivileged user from being able to
      to hold a reference to struct kvm via another method, e.g. debugfs.
      
      [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      919f6cd8
    • S
      KVM: Reject device ioctls from processes other than the VM's creator · ddba9180
      Sean Christopherson 提交于
      KVM's API requires thats ioctls must be issued from the same process
      that created the VM.  In other words, userspace can play games with a
      VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the
      creator can do anything useful.  Explicitly reject device ioctls that
      are issued by a process other than the VM's creator, and update KVM's
      API documentation to extend its requirements to device ioctls.
      
      Fixes: 852b6d57 ("kvm: add device control API")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ddba9180
    • S
      KVM: doc: Fix incorrect word ordering regarding supported use of APIs · 5e124900
      Sean Christopherson 提交于
      Per Paolo[1], instantiating multiple VMs in a single process is legal;
      but this conflicts with KVM's API documentation, which states:
      
        The only supported use is one virtual machine per process, and one
        vcpu per thread.
      
      However, an earlier section in the documentation states:
      
         Only run VM ioctls from the same process (address space) that was used
         to create the VM.
      
      and:
      
         Only run vcpu ioctls from the same thread that was used to create the
         vcpu.
      
      This suggests that the conflicting documentation is simply an incorrect
      ordering of of words, i.e. what's really meant is that a virtual machine
      can't be shared across multiple processes and a vCPU can't be shared
      across multiple threads.
      
      Tweak the blurb on issuing ioctls to use a more assertive tone, and
      rewrite the "supported use" sentence to reference said blurb instead of
      poorly restating it in different terms.
      
      Opportunistically add missing punctuation.
      
      [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com
      
      Fixes: 9c1b96e3 ("KVM: Document basic API")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      [Improve notes on asynchronous ioctl]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5e124900
    • S
      KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' · 47c42e6b
      Sean Christopherson 提交于
      The cr4_pae flag is a bit of a misnomer, its purpose is really to track
      whether the guest PTE that is being shadowed is a 4-byte entry or an
      8-byte entry.  Prior to supporting nested EPT, the size of the gpte was
      reflected purely by CR4.PAE.  KVM fudged things a bit for direct sptes,
      but it was mostly harmless since the size of the gpte never mattered.
      Now that a spte may be tracking an indirect EPT entry, relying on
      CR4.PAE is wrong and ill-named.
      
      For direct shadow pages, force the gpte_size to '1' as they are always
      8-byte entries; EPT entries can only be 8-bytes and KVM always uses
      8-byte entries for NPT and its identity map (when running with EPT but
      not unrestricted guest).
      
      Likewise, nested EPT entries are always 8-bytes.  Nested EPT presents a
      unique scenario as the size of the entries are not dictated by CR4.PAE,
      but neither is the shadow page a direct map.  To handle this scenario,
      set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
      to denote a nested EPT shadow page.  Use the information to avoid
      incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
      
      Providing a consistent and accurate gpte_size fixes a bug reported by
      Vitaly where fast_cr3_switch() always fails when switching from L2 to
      L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
      whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c42e6b
  4. 21 3月, 2019 1 次提交
  5. 17 3月, 2019 1 次提交
  6. 16 3月, 2019 1 次提交
    • S
      KVM: doc: Document the life cycle of a VM and its resources · eca6be56
      Sean Christopherson 提交于
      The series to add memcg accounting to KVM allocations[1] states:
      
        There are many KVM kernel memory allocations which are tied to the
        life of the VM process and should be charged to the VM process's
        cgroup.
      
      While it is correct to account KVM kernel allocations to the cgroup of
      the process that created the VM, it's technically incorrect to state
      that the KVM kernel memory allocations are tied to the life of the VM
      process.  This is because the VM itself, i.e. struct kvm, is not tied to
      the life of the process which created it, rather it is tied to the life
      of its associated file descriptor.  In other words, kvm_destroy_vm() is
      not invoked until fput() decrements its associated file's refcount to
      zero.  A simple example is to fork() in Qemu and have the child sleep
      indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
      descriptor *and* the rogue child is killed.
      
      The allocations are guaranteed to be *accounted* to the process which
      created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
      ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
      the VM "alive" but can't do anything useful with its reference.
      
      Note that because 'struct kvm' also holds a reference to the mm_struct
      of its owner, the above behavior also applies to userspace allocations.
      
      Given that mucking with a VM's file descriptor can lead to subtle and
      undesirable behavior, e.g. memcg charges persisting after a VM is shut
      down, explicitly document a VM's lifecycle and its impact on the VM's
      resources.
      
      Alternatively, KVM could aggressively free resources when the creating
      process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
      isn't guaranteed to be available, and freeing resources when the creator
      exits is likely to be error prone and fragile as KVM would need to
      ensure that it only freed resources that are truly out of reach. In
      practice, the existing behavior shouldn't be problematic as a properly
      configured system will prevent a child process from being moved out of
      the appropriate cgroup hierarchy, i.e. prevent hiding the process from
      the OOM killer, and will prevent an unprivileged user from being able to
      to hold a reference to struct kvm via another method, e.g. debugfs.
      
      [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eca6be56
  7. 15 3月, 2019 1 次提交
  8. 13 3月, 2019 3 次提交
  9. 12 3月, 2019 1 次提交
  10. 09 3月, 2019 2 次提交
  11. 08 3月, 2019 4 次提交
    • D
      lib/lzo: separate lzo-rle from lzo · 45ec975e
      Dave Rodgman 提交于
      To prevent any issues with persistent data, separate lzo-rle from lzo so
      that it is treated as a separate algorithm, and lzo is still available.
      
      Link: http://lkml.kernel.org/r/20190205155944.16007-3-dave.rodgman@arm.comSigned-off-by: NDave Rodgman <dave.rodgman@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
      Cc: Matt Sealey <matt.sealey@arm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <nitingupta910@gmail.com>
      Cc: Richard Purdie <rpurdie@openedhand.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45ec975e
    • D
      lib/lzo: implement run-length encoding · 5ee4014a
      Dave Rodgman 提交于
      Patch series "lib/lzo: run-length encoding support", v5.
      
      Following on from the previous lzo-rle patchset:
      
        https://lkml.org/lkml/2018/11/30/972
      
      This patchset contains only the RLE patches, and should be applied on
      top of the non-RLE patches ( https://lkml.org/lkml/2019/2/5/366 ).
      
      Previously, some questions were raised around the RLE patches.  I've
      done some additional benchmarking to answer these questions.  In short:
      
       - RLE offers significant additional performance (data-dependent)
      
       - I didn't measure any regressions that were clearly outside the noise
      
      One concern with this patchset was around performance - specifically,
      measuring RLE impact separately from Matt Sealey's patches (CTZ & fast
      copy).  I have done some additional benchmarking which I hope clarifies
      the benefits of each part of the patchset.
      
      Firstly, I've captured some memory via /dev/fmem from a Chromebook with
      many tabs open which is starting to swap, and then split this into 4178
      4k pages.  I've excluded the all-zero pages (as zram does), and also the
      no-zero pages (which won't tell us anything about RLE performance).
      This should give a realistic test dataset for zram.  What I found was
      that the data is VERY bimodal: 44% of pages in this dataset contain 5%
      or fewer zeros, and 44% contain over 90% zeros (30% if you include the
      no-zero pages).  This supports the idea of special-casing zeros in zram.
      
      Next, I've benchmarked four variants of lzo on these pages (on 64-bit
      Arm at max frequency): baseline LZO; baseline + Matt Sealey's patches
      (aka MS); baseline + RLE only; baseline + MS + RLE.  Numbers are for
      weighted roundtrip throughput (the weighting reflects that zram does
      more compression than decompression).
      
        https://drive.google.com/file/d/1VLtLjRVxgUNuWFOxaGPwJYhl_hMQXpHe/view?usp=sharing
      
      Matt's patches help in all cases for Arm (and no effect on Intel), as
      expected.
      
      RLE also behaves as expected: with few zeros present, it makes no
      difference; above ~75%, it gives a good improvement (50 - 300 MB/s on
      top of the benefit from Matt's patches).
      
      Best performance is seen with both MS and RLE patches.
      
      Finally, I have benchmarked the same dataset on an x86-64 device.  Here,
      the MS patches make no difference (as expected); RLE helps, similarly as
      on Arm.  There were no definite regressions; allowing for observational
      error, 0.1% (3/4178) of cases had a regression > 1 standard deviation,
      of which the largest was 4.6% (1.2 standard deviations).  I think this
      is probably within the noise.
      
        https://drive.google.com/file/d/1xCUVwmiGD0heEMx5gcVEmLBI4eLaageV/view?usp=sharing
      
      One point to note is that the graphs show RLE appears to help very
      slightly with no zeros present! This is because the extra code causes
      the clang optimiser to change code layout in a way that happens to have
      a significant benefit.  Taking baseline LZO and adding a do-nothing line
      like "__builtin_prefetch(out_len);" immediately before the "goto next"
      has the same effect.  So this is a real, but basically spurious effect -
      it's small enough not to upset the overall findings.
      
      This patch (of 3):
      
      When using zram, we frequently encounter long runs of zero bytes.  This
      adds a special case which identifies runs of zeros and encodes them
      using run-length encoding.
      
      This is faster for both compression and decompresion.  For high-entropy
      data which doesn't hit this case, impact is minimal.
      
      Compression ratio is within a few percent in all cases.
      
      This modifies the bitstream in a way which is backwards compatible
      (i.e., we can decompress old bitstreams, but old versions of lzo cannot
      decompress new bitstreams).
      
      Link: http://lkml.kernel.org/r/20190205155944.16007-2-dave.rodgman@arm.comSigned-off-by: NDave Rodgman <dave.rodgman@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
      Cc: Matt Sealey <matt.sealey@arm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <nitingupta910@gmail.com>
      Cc: Richard Purdie <rpurdie@openedhand.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ee4014a
    • M
      kernel/configs: use .incbin directive to embed config_data.gz · 13610aa9
      Masahiro Yamada 提交于
      This slightly optimizes the kernel/configs.c build.
      
      bin2c is not very efficient because it converts a data file into a huge
      array to embed it into a *.c file.
      
      Instead, we can use the .incbin directive.
      
      Also, this simplifies the code; Makefile is cleaner, and the way to get
      the offset/size of the config_data.gz is more straightforward.
      
      I used the "asm" statement in *.c instead of splitting it into *.S
      because MODULE_* tags are not supported in *.S files.
      
      I also cleaned up kernel/.gitignore; "config_data.gz" is unneeded
      because the top-level .gitignore takes care of the "*.gz" pattern.
      
      [yamada.masahiro@socionext.com: v2]
        Link: http://lkml.kernel.org/r/1550108893-21226-1-git-send-email-yamada.masahiro@socionext.com
      Link: http://lkml.kernel.org/r/1549941160-8084-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13610aa9
    • A
      configs: get rid of obsolete CONFIG_ENABLE_WARN_DEPRECATED · 3337d5cf
      Alexey Brodkin 提交于
      This Kconfig option was removed during v4.19 development in commit
      771c0353 ("deprecate the '__deprecated' attribute warnings entirely
      and for good") so there's no point to keep it in defconfigs any longer.
      
      FWIW defconfigs were patched with:
      --------------------------->8----------------------
      find . -name *_defconfig -exec sed -i '/CONFIG_ENABLE_WARN_DEPRECATED/d' {} \;
      --------------------------->8----------------------
      
      Link: http://lkml.kernel.org/r/20190128152434.41969-1-abrodkin@synopsys.comSigned-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3337d5cf
  12. 07 3月, 2019 5 次提交
  13. 06 3月, 2019 8 次提交
    • A
      mm: remove zone_lru_lock() function, access ->lru_lock directly · f4b7e272
      Andrey Ryabinin 提交于
      We have common pattern to access lru_lock from a page pointer:
      	zone_lru_lock(page_zone(page))
      
      Which is silly, because it unfolds to this:
      	&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]->zone_pgdat->lru_lock
      while we can simply do
      	&NODE_DATA(page_to_nid(page))->lru_lock
      
      Remove zone_lru_lock() function, since it's only complicate things.  Use
      'page_pgdat(page)->lru_lock' pattern instead.
      
      [aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()]
        Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com
      Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4b7e272
    • C
      mm: memcontrol: expose THP events on a per-memcg basis · 1ff9e6e1
      Chris Down 提交于
      Currently THP allocation events data is fairly opaque, since you can
      only get it system-wide.  This patch makes it easier to reason about
      transparent hugepage behaviour on a per-memcg basis.
      
      For anonymous THP-backed pages, we already have MEMCG_RSS_HUGE in v1,
      which is used for v1's rss_huge [sic].  This is reused here as it's
      fairly involved to untangle NR_ANON_THPS right now to make it per-memcg,
      since right now some of this is delegated to rmap before we have any
      memcg actually assigned to the page.  It's a good idea to rework that,
      but let's leave untangling THP allocation for a future patch.
      
      [akpm@linux-foundation.org: fix build]
      [chris@chrisdown.name: fix memcontrol build when THP is disabled]
        Link: http://lkml.kernel.org/r/20190131160802.GA5777@chrisdown.name
      Link: http://lkml.kernel.org/r/20190129205852.GA7310@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ff9e6e1
    • D
      mm: convert PG_balloon to PG_offline · ca215086
      David Hildenbrand 提交于
      PG_balloon was introduced to implement page migration/compaction for
      pages inflated in virtio-balloon.  Nowadays, it is only a marker that a
      page is part of virtio-balloon and therefore logically offline.
      
      We also want to make use of this flag in other balloon drivers - for
      inflated pages or when onlining a section but keeping some pages offline
      (e.g.  used right now by XEN and Hyper-V via set_online_page_callback()).
      
      We are going to expose this flag to dump tools like makedumpfile.  But
      instead of exposing PG_balloon, let's generalize the concept of marking
      pages as logically offline, so it can be reused for other purposes later
      on.
      
      Rename PG_balloon to PG_offline.  This is an indicator that the page is
      logically offline, the content stale and that it should not be touched
      (e.g.  a hypervisor would have to allocate backing storage in order for
      the guest to dump an unused page).  We can then e.g.  exclude such pages
      from dumps.
      
      We replace and reuse KPF_BALLOON (23), as this shouldn't really harm
      (and for now the semantics stay the same).  In following patches, we
      will make use of this bit also in other balloon drivers.  While at it,
      document PGTABLE.
      
      [akpm@linux-foundation.org: fix comment text, per David]
      Link: http://lkml.kernel.org/r/20181119101616.8901-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NPankaj gupta <pagupta@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Christian Hansen <chansen3@cisco.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Miles Chen <miles.chen@mediatek.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Julien Freche <jfreche@vmware.com>
      Cc: Kairui Song <kasong@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Lianbo Jiang <lijiang@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xavier Deguillard <xdeguillard@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca215086
    • C
      f2fs: fix to document inline_xattr_size option · 7321dd97
      Chao Yu 提交于
      We missed to add document for inline_xattr_size mount option in f2fs.txt,
      add it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7321dd97
    • M
      dm cache: add support for discard passdown to the origin device · de7180ff
      Mike Snitzer 提交于
      DM cache now defaults to passing discards down to the origin device.
      User may disable this using the "no_discard_passdown" feature when
      creating the cache device.
      
      If the cache's underlying origin device doesn't support discards then
      passdown is disabled (with warning).  Similarly, if the underlying
      origin device's max_discard_sectors is less than a cache block discard
      passdown will be disabled (this is required because sizing of the cache
      internal discard bitset depends on it).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      de7180ff
    • H
      dm: add support to directly boot to a mapped device · 6bbc923d
      Helen Koike 提交于
      Add a "create" module parameter, which allows device-mapper targets to
      be configured at boot time. This enables early use of DM targets in the
      boot process (as the root device or otherwise) without the need of an
      initramfs.
      
      The syntax used in the boot param is based on the concise format from
      the dmsetup tool to follow the rule of least surprise:
      
      	dmsetup table --concise /dev/mapper/lroot
      
      Which is:
      	dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
      
      Where,
      	<name>		::= The device name.
      	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
      	<minor>		::= The device minor number | ""
      	<flags>		::= "ro" | "rw"
      	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
      	<target_type>	::= "verity" | "linear" | ...
      
      For example, the following could be added in the boot parameters:
      dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
      
      Only the targets that were tested are allowed and the ones that don't
      change any block device when the device is create as read-only. For
      example, mirror and cache targets are not allowed. The rationale behind
      this is that if the user makes a mistake, choosing the wrong device to
      be the mirror or the cache can corrupt data.
      
      The only targets initially allowed are:
      * crypt
      * delay
      * linear
      * snapshot-origin
      * striped
      * verity
      Co-developed-by: NWill Drewry <wad@chromium.org>
      Co-developed-by: NKees Cook <keescook@chromium.org>
      Co-developed-by: NEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Signed-off-by: NHelen Koike <helen.koike@collabora.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6bbc923d
    • J
      Documentation: modern versions of ceph are not backed by btrfs · d11ae8e0
      Jeff Layton 提交于
      [ Update the links too. ]
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d11ae8e0
    • Y
      ceph: add mount option to limit caps count · fe33032d
      Yan, Zheng 提交于
      If number of caps exceed the limit, ceph_trim_dentires() also trim
      dentries with valid leases. Trimming dentry releases references to
      associated inode, which may evict inode and release caps.
      
      By default, there is no limit for caps count.
      Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fe33032d
  14. 05 3月, 2019 5 次提交