1. 24 5月, 2018 1 次提交
    • T
      cgroup: css_set_lock should nest inside tasklist_lock · d8742e22
      Tejun Heo 提交于
      cgroup_enable_task_cg_lists() incorrectly nests non-irq-safe
      tasklist_lock inside irq-safe css_set_lock triggering the following
      lockdep warning.
      
        WARNING: possible irq lock inversion dependency detected
        4.17.0-rc1-00027-gb37d049 #6 Not tainted
        --------------------------------------------------------
        systemd/1 just changed the state of lock:
        00000000fe57773b (css_set_lock){..-.}, at: cgroup_free+0xf2/0x12a
        but this lock took another, SOFTIRQ-unsafe lock in the past:
         (tasklist_lock){.+.+}
      
        and interrupts could create inverse lock ordering between them.
      
        other info that might help us debug this:
         Possible interrupt unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(tasklist_lock);
      				 local_irq_disable();
      				 lock(css_set_lock);
      				 lock(tasklist_lock);
          <Interrupt>
            lock(css_set_lock);
      
         *** DEADLOCK ***
      
      The condition is highly unlikely to actually happen especially given
      that the path is executed only once per boot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NBoqun Feng <boqun.feng@gmail.com>
      d8742e22
  2. 08 5月, 2018 1 次提交
  3. 27 4月, 2018 15 次提交
    • T
      cgroup: Make cgroup_rstat_updated() ready for root cgroup usage · c43c5ea7
      Tejun Heo 提交于
      cgroup_rstat_updated() ensures that the cgroup's rstat is linked to
      the parent.  If there's no parent, it never gets linked and the
      function ends up grabbing and releasing the cgroup_rstat_lock each
      time for no reason which can be expensive.
      
      This hasn't been a problem till now because nobody was calling the
      function for the root cgroup but rstat is gonna be exposed to
      controllers and use cases, so let's get ready.  Make
      cgroup_rstat_updated() an no-op for the root cgroup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c43c5ea7
    • T
      cgroup: Add memory barriers to plug cgroup_rstat_updated() race window · 9a9e97b2
      Tejun Heo 提交于
      cgroup_rstat_updated() has a small race window where an updated
      signaling can race with flush and could be lost till the next update.
      This wasn't a problem for the existing usages, but we plan to use
      rstat to track counters which need to be accurate.
      
      This patch plugs the race window by synchronizing
      cgroup_rstat_updated() and flush path with memory barriers around
      cgroup_rstat_cpu->updated_next pointer.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9a9e97b2
    • T
      cgroup: Add cgroup_subsys->css_rstat_flush() · 8f53470b
      Tejun Heo 提交于
      This patch adds cgroup_subsys->css_rstat_flush().  If a subsystem has
      this callback, its csses are linked on cgrp->css_rstat_list and rstat
      will call the function whenever the associated cgroup is flushed.
      Flush is also performed when such csses are released so that residual
      counts aren't lost.
      
      Combined with the rstat API previous patches factored out, this allows
      controllers to plug into rstat to manage their statistics in a
      scalable way.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8f53470b
    • T
      cgroup: Replace cgroup_rstat_mutex with a spinlock · 0fa294fb
      Tejun Heo 提交于
      Currently, rstat flush path is protected with a mutex which is fine as
      all the existing users are from interface file show path.  However,
      rstat is being generalized for use by controllers and flushing from
      atomic contexts will be necessary.
      
      This patch replaces cgroup_rstat_mutex with a spinlock and adds a
      irq-safe flush function - cgroup_rstat_flush_irqsafe().  Explicit
      yield handling is added to the flush path so that other flush
      functions can yield to other threads and flushers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0fa294fb
    • T
      cgroup: Factor out and expose cgroup_rstat_*() interface functions · 6162cef0
      Tejun Heo 提交于
      cgroup_rstat is being generalized so that controllers can use it too.
      This patch factors out and exposes the following interface functions.
      
      * cgroup_rstat_updated(): Renamed from cgroup_rstat_cpu_updated() for
        consistency.
      
      * cgroup_rstat_flush_hold/release(): Factored out from base stat
        implementation.
      
      * cgroup_rstat_flush(): Verbatim expose.
      
      While at it, drop assert on cgroup_rstat_mutex in
      cgroup_base_stat_flush() as it crosses layers and make a minor comment
      update.
      
      v2: Added EXPORT_SYMBOL_GPL(cgroup_rstat_updated) to fix a build bug.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6162cef0
    • T
      cgroup: Reorganize kernel/cgroup/rstat.c · a17556f8
      Tejun Heo 提交于
      Currently, rstat.c has rstat and base stat implementations intermixed.
      Collect base stat implementation at the end of the file.  Also,
      reorder the prototypes.
      
      This patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a17556f8
    • T
      cgroup: Distinguish base resource stat implementation from rstat · d4ff749b
      Tejun Heo 提交于
      Base resource stat accounts universial (not specific to any
      controller) resource consumptions on top of rstat.  Currently, its
      implementation is intermixed with rstat implementation making the code
      confusing to follow.
      
      This patch clarifies the distintion by doing the followings.
      
      * Encapsulate base resource stat counters, currently only cputime, in
        struct cgroup_base_stat.
      
      * Move prev_cputime into struct cgroup and initialize it with cgroup.
      
      * Rename the related functions so that they start with cgroup_base_stat.
      
      * Prefix the related variables and field names with b.
      
      This patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d4ff749b
    • T
      cgroup: Rename stat to rstat · c58632b3
      Tejun Heo 提交于
      stat is too generic a name and ends up causing subtle confusions.
      It'll be made generic so that controllers can plug into it, which will
      make the problem worse.  Let's rename it to something more specific -
      cgroup_rstat for cgroup recursive stat.
      
      This patch does the following renames.  No other changes.
      
      * cpu_stat	-> rstat_cpu
      * stat		-> rstat
      * ?cstat	-> ?rstatc
      
      Note that the renames are selective.  The unrenamed are the ones which
      implement basic resource statistics on top of rstat.  This will be
      further cleaned up in the following patches.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c58632b3
    • T
      cgroup: Rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c · a5c2b93f
      Tejun Heo 提交于
      stat is too generic a name and ends up causing subtle confusions.
      It'll be made generic so that controllers can plug into it, which will
      make the problem worse.  Let's rename it to something more specific -
      cgroup_rstat for cgroup recursive stat.
      
      First, rename kernel/cgroup/stat.c to kernel/cgroup/rstat.c.  No
      content changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a5c2b93f
    • T
      cgroup: Limit event generation frequency · b12e3583
      Tejun Heo 提交于
      ".events" files generate file modified event to notify userland of
      possible new events.  Some of the events can be quite bursty
      (e.g. memory high event) and generating notification each time is
      costly and pointless.
      
      This patch implements a event rate limit mechanism.  If a new
      notification is requested before 10ms has passed since the previous
      notification, the new notification is delayed till then.
      
      As this only delays from the second notification on in a given close
      cluster of notifications, userland reactions to notifications
      shouldn't be delayed at all in most cases while avoiding notification
      storms.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b12e3583
    • T
      cgroup: Explicitly remove core interface files · 5faaf05f
      Tejun Heo 提交于
      The "cgroup." core interface files bypass the usual interface removal
      path and get removed recursively along with the cgroup itself.  While
      this works now, the subtle discrepancy gets in the way of implementing
      common mechanisms.
      
      This patch updates cgroup core interface file handling so that it's
      consistent with controller interface files.  When added, the css is
      marked CSS_VISIBLE and they're explicitly removed before the cgroup is
      destroyed.
      
      This doesn't cause user-visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5faaf05f
    • L
      Merge tag 'acpi-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fe03a759
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These are two watchdog-related fixes, fix for a backlight regression
        from the 4.16 cycle that unfortunately was propagated to -stable and a
        button module modification to prevent graphics driver modules from
        failing to load due to unmet dependencies if ACPI is disabled from the
        kernel command line.
      
        Specifics:
      
         - Change the ACPI subsystem initialization ordering to initialize the
           WDAT watchodg before reserving PNP motherboard resources so as to
           allow the watchdog to allocate its resources before the PNP code
           gets to them and prevents it from working correctly (Mika
           Westerberg).
      
         - Add a quirk for Lenovo Z50-70 to use the iTCO watchdog instead of
           the WDAT one which conflicts with the RTC on that platform (Mika
           Westerberg).
      
         - Avoid breaking backlight handling on Dell XPS 13 2013 model by
           allowing laptops to use the ACPI backlight by default even if they
           are Windows 8-ready in principle (Hans de Goede)"
      
      * tag 'acpi-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / video: Only default only_lcd to true on Win8-ready _desktops_
        ACPI / button: make module loadable when booted in non-ACPI mode
        ACPI / watchdog: Prefer iTCO_wdt on Lenovo Z50-70
        ACPI / scan: Initialize watchdog before PNP
      fe03a759
    • L
      Merge tag 'pm-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e58d911f
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These are a Low Power S0 Idle quirk, a hibernation handling fix for
        the PCI bus type and a brcmstb-avs-cpufreq driver fixup removing
        development debug code from it.
      
        Specifics:
      
         - Blacklist the Low Power S0 Idle _DSM on ThinkPad X1 Tablet(2016)
           where it causes issues and make it use ACPI S3 which works instead
           of the non-working suspend-to-idle by default (Chen Yu).
      
         - Fix the handling of hibernation in the PCI core for devices with
           the DPM_FLAG_SMART_SUSPEND flag set to fix a regression affecting
           intel-lpss I2C devices (Mika Westerberg).
      
         - Drop development debug code from the brcmstb-avs-cpufreq driver
           (Markus Mayer)"
      
      * tag 'pm-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: brcmstb-avs-cpufreq: remove development debug support
        PCI / PM: Do not clear state_saved in pci_pm_freeze() when smart suspend is set
        ACPI / PM: Blacklist Low Power S0 Idle _DSM for ThinkPad X1 Tablet(2016)
      e58d911f
    • L
      Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · 665fa000
      Linus Torvalds 提交于
      Pull /dev/random fixes from Ted Ts'o:
       "Fix a regression on NUMA kernels and suppress excess unseeded entropy
        pool warnings"
      
      * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
        random: rate limit unseeded randomness warnings
        random: fix possible sleeping allocation from irq context
      665fa000
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 1334ac11
      Linus Torvalds 提交于
      Pull s390 fixes from Martin Schwidefsky:
       "A couple of bug fixes:
      
         - correct some CPU-MF counter names for z13 and z14
      
         - correct locking in the vfio-ccw fsm_io_helper function
      
         - provide arch_uretprobe_is_alive to avoid sigsegv with uretprobes
      
         - fix a corner case with CPU-MF sampling in regard to execve
      
         - fix expoline code revert for loadable modules
      
         - update chpid descriptor for resource accessibility events
      
         - fix dasd I/O errors due to outdated device alias infomation"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: correct module section names for expoline code revert
        vfio: ccw: process ssch with interrupts disabled
        s390: update sampling tag after task pid change
        s390/cpum_cf: rename IBM z13/z14 counter names
        s390/dasd: fix IO error for newly defined devices
        s390/uprobes: implement arch_uretprobe_is_alive()
        s390/cio: update chpid descriptor after resource accessibility event
      1334ac11
  4. 26 4月, 2018 7 次提交
    • R
      Merge branches 'acpi-watchdog', 'acpi-button' and 'acpi-video' · bd6dff55
      Rafael J. Wysocki 提交于
      * acpi-watchdog:
        ACPI / watchdog: Prefer iTCO_wdt on Lenovo Z50-70
      
      * acpi-button:
        ACPI / button: make module loadable when booted in non-ACPI mode
      
      * acpi-video:
        ACPI / video: Only default only_lcd to true on Win8-ready _desktops_
      bd6dff55
    • R
      Merge branches 'acpi-pm' and 'pm-cpufreq' · e140c4af
      Rafael J. Wysocki 提交于
      * acpi-pm:
        ACPI / PM: Blacklist Low Power S0 Idle _DSM for ThinkPad X1 Tablet(2016)
      
      * pm-cpufreq:
        cpufreq: brcmstb-avs-cpufreq: remove development debug support
      e140c4af
    • L
      Merge tag 'for_v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 69bfd470
      Linus Torvalds 提交于
      Pull fsnotify fix from Jan Kara:
       "A fix of a fsnotify race causing panics / softlockups"
      
      * tag 'for_v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fsnotify: Fix fsnotify_mark_connector race
      69bfd470
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 3442097b
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Eight bug fixes, one spelling update and one tracepoint addition.
      
        The most serious is probably the mptsas write same fix because it
        means anyone using these controllers sees errors when modern
        filesystems try to issue discards"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: target: fix crash with iscsi target and dvd
        scsi: sd_zbc: Avoid that resetting a zone fails sporadically
        scsi: sd: Defer spinning up drive while SANITIZE is in progress
        scsi: megaraid_sas: Do not log an error if FW successfully initializes.
        scsi: ufs: add trace event for ufs upiu
        scsi: core: remove reference to scsi_show_extd_sense()
        scsi: mptsas: Disable WRITE SAME
        scsi: fnic: fix spelling mistake in fnic stats "Abord" -> "Abort"
        scsi: scsi_debug: IMMED related delay adjustments
        scsi: iscsi: respond to netlink with unicast when appropriate
      3442097b
    • L
      Merge tag 'for-linus-20180425' of git://git.kernel.dk/linux-block · 8fba70b0
      Linus Torvalds 提交于
      Pull block updates from Jens Axboe:
       "I ended up sitting on this about a week longer than I wanted to, since
        we were hashing out details with a timeout change. I've now killed
        that patch, so we can flush the existing queue in due time.
      
        This contains:
      
         - Fix for an old regression, where entering the queue can be
           disturbed by a signal to the process. This can cause spurious EIO.
           Fix from Alan Jenkins.
      
         - cdrom information leak fix from Dan.
      
         - Trivial helper for testing queue FUA from Dave Chinner, part of his
           O_DIRECT FUA series.
      
         - Series of swim fixes from Finn that actually makes it work again.
      
         - Loop O_DIRECT corruption fix, which caused data corruption in
           production for us. From me.
      
         - BFQ crash fix from me.
      
         - bcache maintainer update. Michael no longer has the time to do it,
           Coly has stepped up to serve as the new maintainer.
      
         - blkcg locking fixes from Jiang Biao.
      
         - Revert of a change from this merge window from Ming, that causes an
           issue on some hardware.
      
         - Minor clarification doc addition from Linus Walleij"
      
      * tag 'for-linus-20180425' of git://git.kernel.dk/linux-block: (22 commits)
        Revert "blk-mq: remove code for dealing with remapping queue"
        block: mq: Add some minor doc for core structs
        bcache: mark Coly Li as bcache maintainer
        MAINTAINERS: Remove me as maintainer of bcache
        blkcg: init root blkcg_gq under lock
        blkcg: small fix on comment in blkcg_init_queue
        blkcg: don't hold blkcg lock when deactivating policy
        block: add blk_queue_fua() helper function
        cdrom: information leak in cdrom_ioctl_media_changed()
        bfq-iosched: ensure to clear bic/bfqq pointers when preparing request
        blk-mq: start request gstate with gen 1
        block/swim: Select appropriate drive on device open
        block/swim: Fix IO error at end of medium
        block/swim: Check drive type
        block/swim: Rename macros to avoid inconsistent inverted logic
        block/swim: Don't log an error message for an invalid ioctl
        block/swim: Remove extra put_disk() call from error path
        block/swim: Fix array bounds check
        m68k/mac: Don't remap SWIM MMIO region
        loop: handle short DIO reads
        ...
      8fba70b0
    • L
      Merge tag 'riscv-for-linus-4.17-rc3' of... · c6dc3e71
      Linus Torvalds 提交于
      Merge tag 'riscv-for-linus-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V fixes from Palmer Dabbelt:
       "This contains three small fixes related to the RISC-V port that I'd
        like to target for 4.17-rc3:
      
         - a Kconfig cleanup to select DMA_DIRECT_OPS instead of redefining it
           in arch/riscv
      
         - the removal of asm/handle_irq.h, which doesn't exist, from our arch
           header list
      
         - the addition of "-no-pie" the link rules for our VDSO-related
           files, which fixes the build on systems where PIE is enabled by
           default"
      
      * tag 'riscv-for-linus-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        RISC-V: build vdso-dummy.o with -no-pie
        riscv: there is no <asm/handle_irq.h>
        riscv: select DMA_DIRECT_OPS instead of redefining it
      c6dc3e71
    • L
      Merge tag 'dma-mapping-4.17-3' of git://git.infradead.org/users/hch/dma-mapping · 26ed24e4
      Linus Torvalds 提交于
      Pull dma-mapping fixes from Christoph Hellwig:
       "A few small dma-mapping fixes for Linux 4.17-rc3:
      
         - don't loop to try GFP_DMA allocations if ZONE_DMA is not actually
           enabled (regression in 4.16)
      
         - don't try to do virt_to_page before we know we actuall have a valid
           page in dma_common_mmap
      
         - a comment fixup related to the above fix"
      
      * tag 'dma-mapping-4.17-3' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: postpone cpu addr translation on mmap
        dma-coherent: clarify dma_mmap_from_dev_coherent documentation
        dma-direct: don't retry allocation for no-op GFP_DMA
      26ed24e4
  5. 25 4月, 2018 16 次提交