1. 03 10月, 2018 2 次提交
  2. 19 9月, 2018 3 次提交
    • L
      PCI: hotplug: Embed hotplug_slot · 125450f8
      Lukas Wunner 提交于
      When the PCI hotplug core and its first user, cpqphp, were introduced in
      February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
      struct for its internal use plus a hotplug_slot struct to be registered
      with the hotplug core and linked the two with pointers:
      https://git.kernel.org/tglx/history/c/a8a2069f432c
      
      Nowadays, the predominant pattern in the tree is to embed ("subclass")
      such structures in one another and cast to the containing struct with
      container_of().  But it wasn't until July 2002 that container_of() was
      introduced with historic commit ec4f214232cf:
      https://git.kernel.org/tglx/history/c/ec4f214232cf
      
      pnv_php, introduced in 2016, did the right thing and embedded struct
      hotplug_slot in its internal struct pnv_php_slot, but all other drivers
      cargo-culted cpqphp's design and linked separate structs with pointers.
      
      Embedding structs is preferrable to linking them with pointers because
      it requires fewer allocations, thereby reducing overhead and simplifying
      error paths.  Casting an embedded struct to the containing struct
      becomes a cheap subtraction rather than a dereference.  And having fewer
      pointers reduces the risk of them pointing nowhere either accidentally
      or due to an attack.
      
      Convert all drivers to embed struct hotplug_slot in their internal slot
      struct.  The "private" pointer in struct hotplug_slot thereby becomes
      unused, so drop it.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>  # drivers/pci/hotplug/rpa*
      Acked-by: Sebastian Ott <sebott@linux.ibm.com>        # drivers/pci/hotplug/s390*
      Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
      Cc: Len Brown <lenb@kernel.org>
      Cc: Scott Murray <scott@spiteful.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Corentin Chary <corentin.chary@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      125450f8
    • L
      PCI: hotplug: Drop hotplug_slot_info · a7da2161
      Lukas Wunner 提交于
      Ever since the PCI hotplug core was introduced in 2002, drivers had to
      allocate and register a struct hotplug_slot_info for every slot:
      https://git.kernel.org/tglx/history/c/a8a2069f432c
      
      Apparently the idea was that drivers furnish the hotplug core with an
      up-to-date card presence status, power status, latch status and
      attention indicator status as well as notify the hotplug core of changes
      thereof.  However only 4 out of 12 hotplug drivers bother to notify the
      hotplug core with pci_hp_change_slot_info() and the hotplug core never
      made any use of the information:  There is just a single macro in
      pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if
      the driver lacks the corresponding callback in hotplug_slot_ops.  The
      macro is called when the user reads the attribute via sysfs.
      
      Now, if the callback isn't defined, the attribute isn't exposed in sysfs
      in the first place (see e.g. has_power_file()).  There are only two
      situations when the hotplug_slot_info would actually be accessed:
      
      * If the driver defines ->enable_slot or ->disable_slot but not
        ->get_power_status.
      
      * If the driver defines ->set_attention_status but not
        ->get_attention_status.
      
      There is no driver doing the former and just a single driver doing the
      latter, namely pnv_php.c.  Amend it with a ->get_attention_status
      callback.  With that, the hotplug_slot_info becomes completely unused by
      the PCI hotplug core.  But a few drivers use it internally as a cache:
      
      cpcihp uses it to cache the latch_status and adapter_status.
      cpqhp uses it to cache the adapter_status.
      pnv_php and rpaphp use it to cache the attention_status.
      shpchp uses it to cache all four values.
      
      Amend these drivers to cache the information in their private slot
      struct.  shpchp's slot struct already contains members to cache the
      power_status and adapter_status, so additional members are only needed
      for the other two values.  In the case of cpqphp, the cached value is
      only accessed in a single place, so instead of caching it, read the
      current value from the hardware.
      
      Caution:  acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop
      populate the hotplug_slot_info with initial values on probe.  That code
      is herewith removed.  There is a theoretical chance that the code has
      side effects without which the driver fails to function, e.g. if the
      ACPI method to read the adapter status needs to be executed at least
      once on probe.  That seems unlikely to me, still maintainers should
      review the changes carefully for this possibility.
      
      Rafael adds: "I'm not aware of any case in which it will break anything,
      [...] but if that happens, it may be necessary to add the execution of
      the control methods in question directly to the initialization part."
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>  # drivers/pci/hotplug/rpa*
      Acked-by: Sebastian Ott <sebott@linux.ibm.com>        # drivers/pci/hotplug/s390*
      Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
      Cc: Len Brown <lenb@kernel.org>
      Cc: Scott Murray <scott@spiteful.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Corentin Chary <corentin.chary@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      a7da2161
    • L
      PCI: hotplug: Constify hotplug_slot_ops · 81c4b5bf
      Lukas Wunner 提交于
      Hotplug drivers cannot declare their hotplug_slot_ops const, making them
      attractive targets for attackers, because upon registration of a hotplug
      slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members
      in that struct.
      
      Fix by moving these members to struct hotplug_slot and constify every
      driver's hotplug_slot_ops except for pciehp.
      
      pciehp constructs its hotplug_slot_ops at runtime based on the PCIe
      port's capabilities, hence cannot declare them const.  It can be
      converted to __write_rarely once that's mainlined:
      http://www.openwall.com/lists/kernel-hardening/2016/11/16/3Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>  # drivers/pci/hotplug/rpa*
      Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
      Cc: Len Brown <lenb@kernel.org>
      Cc: Scott Murray <scott@spiteful.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Corentin Chary <corentin.chary@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      81c4b5bf
  3. 14 9月, 2018 3 次提交
    • M
      xen/balloon: add runtime control for scrubbing ballooned out pages · 197ecb38
      Marek Marczykowski-Górecki 提交于
      Scrubbing pages on initial balloon down can take some time, especially
      in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
      started with memory= significantly lower than maxmem=, all the extra
      pages will be scrubbed before returning to Xen. But since most of them
      weren't used at all at that point, Xen needs to populate them first
      (from populate-on-demand pool). In nested virt case (Xen inside KVM)
      this slows down the guest boot by 15-30s with just 1.5GB needed to be
      returned to Xen.
      
      Add runtime parameter to enable/disable it, to allow initially disabling
      scrubbing, then enable it back during boot (for example in initramfs).
      Such usage relies on assumption that a) most pages ballooned out during
      initial boot weren't used at all, and b) even if they were, very few
      secrets are in the guest at that time (before any serious userspace
      kicks in).
      Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
      enabled by default), controlling default value for the new runtime
      switch.
      Signed-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      197ecb38
    • A
      asm-generic: io: Fix ioport_map() for !CONFIG_GENERIC_IOMAP && CONFIG_INDIRECT_PIO · 500dd232
      Andrew Murray 提交于
      The !CONFIG_GENERIC_IOMAP version of ioport_map uses MMIO_UPPER_LIMIT to
      prevent users from making I/O accesses outside the expected I/O range -
      however it erroneously treats MMIO_UPPER_LIMIT as a mask which is
      contradictory to its other users.
      
      The introduction of CONFIG_INDIRECT_PIO, which subtracts an arbitrary
      amount from IO_SPACE_LIMIT to form MMIO_UPPER_LIMIT, results in ioport_map
      mangling the given port rather than capping it.
      
      We address this by aligning more closely with the CONFIG_GENERIC_IOMAP
      implementation of ioport_map by using the comparison operator and
      returning NULL where the port exceeds MMIO_UPPER_LIMIT. Though note that
      we preserve the existing behavior of masking with IO_SPACE_LIMIT such that
      we don't break existing buggy drivers that somehow rely on this masking.
      
      Fixes: 5745392e ("PCI: Apply the new generic I/O management on PCI IO hosts")
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      500dd232
    • L
      mm: get rid of vmacache_flush_all() entirely · 7a9cdebd
      Linus Torvalds 提交于
      Jann Horn points out that the vmacache_flush_all() function is not only
      potentially expensive, it's buggy too.  It also happens to be entirely
      unnecessary, because the sequence number overflow case can be avoided by
      simply making the sequence number be 64-bit.  That doesn't even grow the
      data structures in question, because the other adjacent fields are
      already 64-bit.
      
      So simplify the whole thing by just making the sequence number overflow
      case go away entirely, which gets rid of all the complications and makes
      the code faster too.  Win-win.
      
      [ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
        also just goes away entirely with this ]
      Reported-by: NJann Horn <jannh@google.com>
      Suggested-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a9cdebd
  4. 12 9月, 2018 2 次提交
  5. 10 9月, 2018 1 次提交
  6. 06 9月, 2018 3 次提交
  7. 05 9月, 2018 4 次提交
  8. 04 9月, 2018 2 次提交
  9. 03 9月, 2018 2 次提交
  10. 01 9月, 2018 2 次提交
    • D
      blkcg: delay blkg destruction until after writeback has finished · 59b57717
      Dennis Zhou (Facebook) 提交于
      Currently, blkcg destruction relies on a sequence of events:
        1. Destruction starts. blkcg_css_offline() is called and blkgs
           release their reference to the blkcg. This immediately destroys
           the cgwbs (writeback).
        2. With blkgs giving up their reference, the blkcg ref count should
           become zero and eventually call blkcg_css_free() which finally
           frees the blkcg.
      
      Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
      and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
      on the completion of all writeback associated with the blkcg. A count of
      the number of cgwbs is maintained and once that goes to zero, blkg
      destruction can follow. This should prevent premature blkg destruction
      related to writeback.
      
      The new process for blkcg cleanup is as follows:
        1. Destruction starts. blkcg_css_offline() is called which offlines
           writeback. Blkg destruction is delayed on the cgwb_refcnt count to
           avoid punting potentially large amounts of outstanding writeback
           to root while maintaining any ongoing policies. Here, the base
           cgwb_refcnt is put back.
        2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
           and handles destruction of blkgs. This is where the css reference
           held by each blkg is released.
        3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
           This finally frees the blkg.
      
      It seems in the past blk-throttle didn't do the most understandable
      things with taking data from a blkg while associating with current. So,
      the simplification and unification of what blk-throttle is doing caused
      this.
      
      Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      59b57717
    • D
      Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" · 6b065462
      Dennis Zhou (Facebook) 提交于
      This reverts commit 4c699480.
      
      Destroying blkgs is tricky because of the nature of the relationship. A
      blkg should go away when either a blkcg or a request_queue goes away.
      However, blkg's pin the blkcg to ensure they remain valid. To break this
      cycle, when a blkcg is offlined, blkgs put back their css ref. This
      eventually lets css_free() get called which frees the blkcg.
      
      The above commit (4c699480) breaks this order of events by trying to
      destroy blkgs in css_free(). As the blkgs still hold references to the
      blkcg, css_free() is never called.
      
      The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
      addressed in the following patch by delaying destruction of a blkg until
      all writeback associated with the blkcg has been finished.
      
      Fixes: 4c699480 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6b065462
  11. 31 8月, 2018 3 次提交
  12. 30 8月, 2018 2 次提交
    • A
      vfs: add the fadvise() file operation · 45cd0faa
      Amir Goldstein 提交于
      This is going to be used by overlayfs and possibly useful
      for other filesystems.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      45cd0faa
    • M
      arm/arm64: smccc-1.1: Handle function result as parameters · 755a8bf5
      Marc Zyngier 提交于
      If someone has the silly idea to write something along those lines:
      
      	extern u64 foo(void);
      
      	void bar(struct arm_smccc_res *res)
      	{
      		arm_smccc_1_1_smc(0xbad, foo(), res);
      	}
      
      they are in for a surprise, as this gets compiled as:
      
      	0000000000000588 <bar>:
      	 588:   a9be7bfd        stp     x29, x30, [sp, #-32]!
      	 58c:   910003fd        mov     x29, sp
      	 590:   f9000bf3        str     x19, [sp, #16]
      	 594:   aa0003f3        mov     x19, x0
      	 598:   aa1e03e0        mov     x0, x30
      	 59c:   94000000        bl      0 <_mcount>
      	 5a0:   94000000        bl      0 <foo>
      	 5a4:   aa0003e1        mov     x1, x0
      	 5a8:   d4000003        smc     #0x0
      	 5ac:   b4000073        cbz     x19, 5b8 <bar+0x30>
      	 5b0:   a9000660        stp     x0, x1, [x19]
      	 5b4:   a9010e62        stp     x2, x3, [x19, #16]
      	 5b8:   f9400bf3        ldr     x19, [sp, #16]
      	 5bc:   a8c27bfd        ldp     x29, x30, [sp], #32
      	 5c0:   d65f03c0        ret
      	 5c4:   d503201f        nop
      
      The call to foo "overwrites" the x0 register for the return value,
      and we end up calling the wrong secure service.
      
      A solution is to evaluate all the parameters before assigning
      anything to specific registers, leading to the expected result:
      
      	0000000000000588 <bar>:
      	 588:   a9be7bfd        stp     x29, x30, [sp, #-32]!
      	 58c:   910003fd        mov     x29, sp
      	 590:   f9000bf3        str     x19, [sp, #16]
      	 594:   aa0003f3        mov     x19, x0
      	 598:   aa1e03e0        mov     x0, x30
      	 59c:   94000000        bl      0 <_mcount>
      	 5a0:   94000000        bl      0 <foo>
      	 5a4:   aa0003e1        mov     x1, x0
      	 5a8:   d28175a0        mov     x0, #0xbad
      	 5ac:   d4000003        smc     #0x0
      	 5b0:   b4000073        cbz     x19, 5bc <bar+0x34>
      	 5b4:   a9000660        stp     x0, x1, [x19]
      	 5b8:   a9010e62        stp     x2, x3, [x19, #16]
      	 5bc:   f9400bf3        ldr     x19, [sp, #16]
      	 5c0:   a8c27bfd        ldp     x29, x30, [sp], #32
      	 5c4:   d65f03c0        ret
      Reported-by: NJulien Grall <julien.grall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      755a8bf5
  13. 29 8月, 2018 3 次提交
    • J
      of: add helper to lookup compatible child node · 36156f92
      Johan Hovold 提交于
      Add of_get_compatible_child() helper that can be used to lookup
      compatible child nodes.
      
      Several drivers currently use of_find_compatible_node() to lookup child
      nodes while failing to notice that the of_find_ functions search the
      entire tree depth-first (from a given start node) and therefore can
      match unrelated nodes. The fact that these functions also drop a
      reference to the node they start searching from (e.g. the parent node)
      is typically also overlooked, something which can lead to use-after-free
      bugs.
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NRob Herring <robh@kernel.org>
      36156f92
    • F
      netfilter: nf_tables: rework ct timeout set support · 0434ccdc
      Florian Westphal 提交于
      Using a private template is problematic:
      
      1. We can't assign both a zone and a timeout policy
         (zone assigns a conntrack template, so we hit problem 1)
      2. Using a template needs to take care of ct refcount, else we'll
         eventually free the private template due to ->use underflow.
      
      This patch reworks template policy to instead work with existing conntrack.
      
      As long as such conntrack has not yet been placed into the hash table
      (unconfirmed) we can still add the timeout extension.
      
      The only caveat is that we now need to update/correct ct->timeout to
      reflect the initial/new state, otherwise the conntrack entry retains the
      default 'new' timeout.
      
      Side effect of this change is that setting the policy must
      now occur from chains that are evaluated *after* the conntrack lookup
      has taken place.
      
      No released kernel contains the timeout policy feature yet, so this change
      should be ok.
      
      Changes since v2:
       - don't handle 'ct is confirmed case'
       - after previous patch, no need to special-case tcp/dccp/sctp timeout
         anymore
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0434ccdc
    • M
      arm/arm64: smccc-1.1: Make return values unsigned long · 1d8f5747
      Marc Zyngier 提交于
      An unfortunate consequence of having a strong typing for the input
      values to the SMC call is that it also affects the type of the
      return values, limiting r0 to 32 bits and r{1,2,3} to whatever
      was passed as an input.
      
      Let's turn everything into "unsigned long", which satisfies the
      requirements of both architectures, and allows for the full
      range of return values.
      Reported-by: NJulien Grall <julien.grall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      1d8f5747
  14. 28 8月, 2018 1 次提交
    • S
      cfg80211: make wmm_rule part of the reg_rule structure · 38cb87ee
      Stanislaw Gruszka 提交于
      Make wmm_rule be part of the reg_rule structure. This simplifies the
      code a lot at the cost of having bigger memory usage. However in most
      cases we have only few reg_rule's and when we do have many like in
      iwlwifi we do not save memory as it allocates a separate wmm_rule for
      each channel anyway.
      
      This also fixes a bug reported in various places where somewhere the
      pointers were corrupted and we ended up doing a null-dereference.
      
      Fixes: 230ebaa1 ("cfg80211: read wmm rules from regulatory database")
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      [rephrase commit message slightly]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      38cb87ee
  15. 27 8月, 2018 1 次提交
  16. 24 8月, 2018 6 次提交