1. 18 12月, 2013 12 次提交
  2. 16 12月, 2013 8 次提交
    • L
      Linux 3.13-rc4 · 319e2e3f
      Linus Torvalds 提交于
      319e2e3f
    • M
      null_blk: mem garbage on NUMA systems during init · 57053d8c
      Matias Bjorling 提交于
      For NUMA systems, initializing the blk-mq layer and using per node hctx.
      We initialize submit queues to 1, while blk-mq nr_hw_queues is
      initialized to the number of NUMA nodes.
      
      This makes the null_init_hctx function overwrite memory outside of what
      it allocated.  In my case it lead to writing garbage into struct
      request_queue's mq_map.
      Signed-off-by: NMatias Bjorling <m@bjorling.me>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57053d8c
    • S
      radeon_pm: fix oops in hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh() · e4158f1b
      Sergey Senozhatsky 提交于
      Since commit ec39f64b ("drm/radeon/dpm: Convert to use
      devm_hwmon_register_with_groups") radeon_hwmon_init() is using
      hwmon_device_register_with_groups(), which sets `rdev' as a device
      private driver_data, while hwmon_attributes_visible() and
      radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'.
      
      Fix them by using dev_get_drvdata(), in order to avoid this oops:
      
        BUG: unable to handle kernel paging request at 0000000000001e28
        IP: [<ffffffffa02ae8b4>] hwmon_attributes_visible+0x18/0x3d [radeon]
        PGD 15057e067 PUD 151a8e067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP
        Call Trace:
          internal_create_group+0x114/0x1d9
          sysfs_create_group+0xe/0x10
          sysfs_create_groups+0x22/0x5f
          device_add+0x34f/0x501
          device_register+0x15/0x18
          hwmon_device_register_with_groups+0xb5/0xed
          radeon_hwmon_init+0x56/0x7c [radeon]
          radeon_pm_init+0x134/0x7e5 [radeon]
          radeon_modeset_init+0x75f/0x8ed [radeon]
          radeon_driver_load_kms+0xc6/0x187 [radeon]
          drm_dev_register+0xf9/0x1b4 [drm]
          drm_get_pci_dev+0x98/0x129 [drm]
          radeon_pci_probe+0xa3/0xac [radeon]
          pci_device_probe+0x6e/0xcf
          driver_probe_device+0x98/0x1c4
          __driver_attach+0x5c/0x7e
          bus_for_each_dev+0x7b/0x85
          driver_attach+0x19/0x1b
          bus_add_driver+0x104/0x1ce
          driver_register+0x89/0xc5
          __pci_register_driver+0x58/0x5b
          drm_pci_init+0x86/0xea [drm]
          radeon_init+0x97/0x1000 [radeon]
          do_one_initcall+0x7f/0x117
          load_module+0x1583/0x1bb4
          SyS_init_module+0xa0/0xaf
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Alexander Deucher <Alexander.Deucher@amd.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4158f1b
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4a251dd2
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't
          figure out why it breaks things.
      
       2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it
          was basically doing "x == x", from Dave Jones.
      
       3) Freescale FEC driver was DMA mapping the wrong number of bytes, from
          Sebastian Siewior.
      
       4) Blackhole and prohibit routes in ipv6 were not doing the right thing
          because their ->input and ->output methods were not being assigned
          correctly.  Now they behave properly like their ipv4 counterparts.
          From Kamala R.
      
       5) Several drivers advertise the NETIF_F_FRAGLIST capability, but
          really do not support this feature and will send garbage packets if
          fed fraglist SKBs.  From Eric Dumazet.
      
       6) Fix long standing user triggerable BUG_ON over loopback in RDS
          protocol stack, from Venkat Venkatsubra.
      
       7) Several not so common code paths can potentially try to invoke
          packet scheduler actions that might be NULL without checking.  Shore
          things up by either 1) defining a method as mandatory and erroring
          on registration if that method is NULL 2) defininig a method as
          optional and the registration function hooks up a default
          implementation when NULL is seen.  From Jamal Hadi Salim.
      
       8) Fix fragment detection in xen-natback driver, from Paul Durrant.
      
       9) Kill dangling enter_memory_pressure method in cg_proto ops, from
          Eric W Biederman.
      
      10) SKBs that traverse namespaces should have their local_df cleared,
          from Hannes Frederic Sowa.
      
      11) IOCB file position is not being updated by macvtap_aio_read() and
          tun_chr_aio_read().  From Zhi Yong Wu.
      
      12) Don't free virtio_net netdev before releasing all of the NAPI
          instances.  From Andrey Vagin.
      
      13) Procfs entry leak in xt_hashlimit, from Sergey Popovich.
      
      14) IPv6 routes that are no cached routes should not count against the
          garbage collection limits.  We had this almost right, but were
          missing handling addrconf generated routes properly.  From Hannes
          Frederic Sowa.
      
      15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL
          route info when they are called, from Stefan Tomanek.
      
      16) TUN and MACVTAP have had truncated packet signalling for some time,
          fix from Jason Wang.
      
      17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet.
      
      18) xen-netback does not interpret the NAPI budget properly for TX work,
          fix from Paul Durrant.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
        igb: Fix for issue where values could be too high for udelay function.
        i40e: fix null dereference
        xen-netback: fix gso_prefix check
        net: make neigh_priv_len in struct net_device 16bit instead of 8bit
        drivers: net: cpsw: fix for cpsw crash when build as modules
        xen-netback: napi: don't prematurely request a tx event
        xen-netback: napi: fix abuse of budget
        sch_tbf: use do_div() for 64-bit divide
        udp: ipv4: must add synchronization in udp_sk_rx_dst_set()
        net:fec: remove duplicate lines in comment about errata ERR006358
        Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"
        8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature
        xen-netback: make sure skb linear area covers checksum field
        net: smc91x: Fix device tree based configuration so it's usable
        udp: ipv4: fix potential use after free in udp_v4_early_demux()
        macvtap: signal truncated packets
        tun: unbreak truncated packet signalling
        net: sched: htb: fix the calculation of quantum
        net: sched: tbf: fix the calculation of max_size
        micrel: add support for KSZ8041RNLI
        ...
      4a251dd2
    • L
      Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 908bfda7
      Linus Torvalds 提交于
      Pull x86 fixes from Peter Anvin:
       "This is a pretty small batch:
      
        The biggest single change is to stop using EFI time services on 32-bit
        platforms.  This matches our current behavior on 64-bit platforms as
        we already had ruled them out there as being too unreliable.  Turns
        out that affects 32-bit platforms, too.
      
        One NULL pointer fix for SGI UV.
      
        Two minor build fixes, one of which only affects icc and the other
        which affects icc and future versions or nonstandard default settings
        of gcc"
      
      * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, efi: Don't use (U)EFI time services on 32 bit
        x86, build, icc: Remove uninitialized_var() from compiler-intel.h
        x86/UV: Fix NULL pointer dereference in uv_flush_tlb_others() if the 'nobau' boot option is used
        x86, build: Pass in additional -mno-mmx, -mno-sse options
      908bfda7
    • L
      Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 9199c4ca
      Linus Torvalds 提交于
      Pull PCI updates from Bjorn Helgaas:
       "PCI device hotplug
          - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
            Wysocki)
      
        Host bridge drivers
          - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
            Helgaas)
          - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
            (Jason Gunthorpe)
      
        Miscellaneous
          - Avoid unnecessary CPU switch when calling .probe() (Alexander
            Duyck)
          - Revert "workqueue: allow work_on_cpu() to be called recursively"
            (Bjorn Helgaas)
          - Disable Bus Master only on kexec reboot (Khalid Aziz)
          - Omit PCI ID macro strings to shorten quirk names for LTO (Michal
            Marek)"
      
      * tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
        PCI: Disable Bus Master only on kexec reboot
        PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
        PCI: Omit PCI ID macro strings to shorten quirk names
        PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
        Revert "workqueue: allow work_on_cpu() to be called recursively"
        PCI: Avoid unnecessary CPU switch when calling driver .probe() method
      9199c4ca
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · b5745c59
      Linus Torvalds 提交于
      Pull SELinux fixes from James Morris.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
        selinux: look for IPsec labels on both inbound and outbound packets
        selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()
        selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()
        selinux: fix possible memory leak
      b5745c59
    • L
      Revert "selinux: consider filesystem subtype in policies" · 29b1deb2
      Linus Torvalds 提交于
      This reverts commit 102aefdd.
      
      Tom London reports that it causes sync() to hang on Fedora rawhide:
      
        https://bugzilla.redhat.com/show_bug.cgi?id=1033965
      
      and Josh Boyer bisected it down to this commit.  Reverting the commit in
      the rawhide kernel fixes the problem.
      
      Eric Paris root-caused it to incorrect subtype matching in that commit
      breaking fuse, and has a tentative patch, but by now we're better off
      retrying this in 3.14 rather than playing with it any more.
      Reported-by: NTom London <selinux@gmail.com>
      Bisected-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Anand Avati <avati@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29b1deb2
  3. 15 12月, 2013 3 次提交
  4. 14 12月, 2013 12 次提交
    • L
      Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · b2077ebc
      Linus Torvalds 提交于
      Pull ARM fixes from Russell King:
       "This resolves some further issues with the dma mask changes on ARM
        which have been found by TI and others, and also some corner cases
        with the updates to the virtual to physical address translations.
      
        Konstantin also found some problems with the unwinder, which now
        performs tighter verification that the stack is valid while unwinding"
      
      * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
        ARM: fix asm/memory.h build error
        ARM: 7917/1: cacheflush: correctly limit range of memory region being flushed
        ARM: 7913/1: fix framepointer check in unwind_frame
        ARM: 7912/1: check stack pointer in get_wchan
        ARM: 7909/1: mm: Call setup_dma_zone() post early_paging_init()
        ARM: 7908/1: mm: Fix the arm_dma_limit calculation
        ARM: another fix for the DMA mapping checks
      b2077ebc
    • L
      Merge tag 'arc-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 2430cdd0
      Linus Torvalds 提交于
      Pull ARC fixes from Vineet Gupta:
       "These are couple of weeks old already, but I just couldn't get them to
        you earlier.
      
         - couple of fixes for recently added perf code
         - build time extable sort"
      
      * tag 'arc-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: [perf] Fix a few thinkos
        ARC: Add guard macro to uapi/asm/unistd.h
        ARC: extable: Enable sorting at build time
      2430cdd0
    • L
      Merge tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 93e1585e
      Linus Torvalds 提交于
      Pull device mapper fixes from Mike Snitzer:
       "A set of device-mapper fixes for 3.13.
      
        A fix for possible memory corruption during DM table load, fix a
        possible leak of snapshot space in case of a crash, fix a possible
        deadlock due to a shared workqueue in the delay target, fix to
        initialize read-only module parameters that are used to export metrics
        for dm stats and dm bufio.
      
        Quite a few stable fixes were identified for both the thin-
        provisioning and caching targets as a result of increased regression
        testing using the device-mapper-test-suite (dmts).  The most notable
        of these are the reference counting fixes for the space map btree that
        is used by the dm-array interface -- without these the dm-cache
        metadata will leak, resulting in dm-cache devices running out of
        metadata blocks.  Also, some important fixes related to the
        thin-provisioning target's transition to read-only mode on error"
      
      * tag 'dm-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm array: fix a reference counting bug in shadow_ablock
        dm space map: disallow decrementing a reference count below zero
        dm stats: initialize read-only module parameter
        dm bufio: initialize read-only module parameters
        dm cache: actually resize cache
        dm cache: update Documentation for invalidate_cblocks's range syntax
        dm cache policy mq: fix promotions to occur as expected
        dm thin: allow pool in read-only mode to transition to read-write mode
        dm thin: re-establish read-only state when switching to fail mode
        dm thin: always fallback the pool mode if commit fails
        dm thin: switch to read-only mode if metadata space is exhausted
        dm thin: switch to read only mode if a mapping insert fails
        dm space map metadata: return on failure in sm_metadata_new_block
        dm table: fail dm_table_create on dm_round_up overflow
        dm snapshot: avoid snapshot space leak on crash
        dm delay: fix a possible deadlock due to shared workqueue
      93e1585e
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 1008ebb6
      Linus Torvalds 提交于
      Pull HID fixes from Jiri Kosina:
      
       - Genius Gx Imperator Keyboard regression fix (missing break in case),
         by Ben Hutchings
      
       - duplicate sysfs entry error fix for hid-sensor-hub driver, by
         Srinivas Pandruvada
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: hid-sensor-hub: fix duplicate sysfs entry error
        HID: kye: Fix missing break in kye_report_fixup()
      1008ebb6
    • R
      ARM: fix asm/memory.h build error · b713aa0b
      Russell King 提交于
      Jason Gunthorpe reports a build failure when ARM_PATCH_PHYS_VIRT is
      not defined:
      
      In file included from arch/arm/include/asm/page.h:163:0,
                       from include/linux/mm_types.h:16,
                       from include/linux/sched.h:24,
                       from arch/arm/kernel/asm-offsets.c:13:
      arch/arm/include/asm/memory.h: In function '__virt_to_phys':
      arch/arm/include/asm/memory.h:244:40: error: 'PHYS_OFFSET' undeclared (first use in this function)
      arch/arm/include/asm/memory.h:244:40: note: each undeclared identifier is reported only once for each function it appears in
      arch/arm/include/asm/memory.h: In function '__phys_to_virt':
      arch/arm/include/asm/memory.h:249:13: error: 'PHYS_OFFSET' undeclared (first use in this function)
      
      Fixes: ca5a45c0 ("ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions")
      Tested-By: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      b713aa0b
    • L
      Merge tag 'regulator-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · ca336751
      Linus Torvalds 提交于
      Pull regulator fixes from Mark Brown:
       "A small set of driver fixes plus one larger core change which changes
        the way we check to see if we're using DT so that there aren't any
        races between deciding we're using DT and the regulator subsystem
        noticing.
      
        This makes the new support for substituting a dummy regulator and
        optional regulators work a lot better on DT systems since it ensures
        that we don't trigger probe deferral when we shouldn't which was
        causing bugs in clients"
      
      * tag 'regulator-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: pfuze100: allow misprogrammed ID
        regulator: pfuze100: Fix address of FABID
        regulator: as3722: set the correct current limit
        regulator: core: Check for DT every time we check full constraints
        regulator: core: Replace checks of have_full_constraints with a function
      ca336751
    • L
      Merge tag 'regmap-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 599eefa0
      Linus Torvalds 提交于
      Pull regmap fixes from Mark Brown:
       "Two small changes to fix some error handling and checking (both of
        which would be quite serious if the errors trigger) plus a trivial
        documentation fix"
      
      * tag 'regmap-v3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: use IS_ERR() to check clk_get() results
        regmap: make sure we unlock on failure in regmap_bulk_write
        regmap: trivial comment fix (copy'n'paste error)
      599eefa0
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 31f984d1
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Here are two simple but wanted fixes for the i2c subsystem"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: imx: Check the return value from clk_prepare_enable()
        i2c: mux: Inherit retry count and timeout from parent for muxed bus
      31f984d1
    • L
      Merge tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd · dbb022cb
      Linus Torvalds 提交于
      Pull MTD fixes from Brian Norris:
       "Two MTD fixes, for the pxa3xx-nand driver:
      
         - This driver was not ready to fully Armada 370 NAND, with
           particularly notable problems seen on flash with 2KB page sizes.
           This "compatible" entry really should have been held back until
           3.14 or later.
      
         - Fix a bug seen in rare cases on the error path of a failed probe
           attempt, where we free unallocated DMA resources"
      
      * tag 'for-linus-20131212' of git://git.infradead.org/linux-mtd:
        mtd: nand: pxa3xx: Use info->use_dma to release DMA resources
        Partially revert "mtd: nand: pxa3xx: Introduce 'marvell,armada370-nand' compatible string"
      dbb022cb
    • L
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · f6493505
      Linus Torvalds 提交于
      Pull slave-dmaengine fixes from Vinod Koul:
       "Here is the common fixes PULL for dmaengine.
      
        Dan has been working on fixing the build issues in bunch of drivers.
        Here we have one fixing s3c24xx-dma, along with fix from Russell on
        pl08x.  Also we have Kuninori rcar dma fixes.  The s3c24xx-dma which
        was added in last merge window missed updates to usage of DMA_COMPLETE
        so converting the last driver"
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dma: fix build breakage in s3c24xx-dma
        Fix pl08x warnings
        rcar-hpbdma: initialise plane information when halted
        rcar-hpbdma: fixup channel busy check for double plane
        rcar-hpbdma: add max transfer size
        dma: mmp_pdma: add missing platform_set_drvdata() in mmp_pdma_probe()
        dmaengine: s3c24xx-dma: use DMA_COMPLETE for dma completion status
      f6493505
    • J
      dm array: fix a reference counting bug in shadow_ablock · ed9571f0
      Joe Thornber 提交于
      An old array block could have its reference count decremented below
      zero when it is being replaced in the btree by a new array block.
      
      The fix is to increment the old ablock's reference count just before
      inserting a new ablock into the btree.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.9+
      ed9571f0
    • J
      dm space map: disallow decrementing a reference count below zero · 5b564d80
      Joe Thornber 提交于
      The old behaviour, returning -EINVAL if a ref_count of 0 would be
      decremented, was removed in commit f722063e ("dm space map: optimise
      sm_ll_dec and sm_ll_inc").  To fix this regression we return an error
      code from the mutator function pointer passed to sm_ll_mutate() and have
      dec_ref_count() return -EINVAL if the old ref_count is 0.
      
      Add a DMERR to reflect the potential seriousness of this error.
      
      Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.
      
      With this fix the following dmts regression test now passes:
       dmtest run --suite cache -n /metadata_use_kernel/
      
      The next patch fixes the higher-level dm-array code that exposed this
      regression.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.12+
      5b564d80
  5. 13 12月, 2013 5 次提交
    • M
    • J
    • L
      Merge branch 'akpm' (fixes from Andrew) · 8d276377
      Linus Torvalds 提交于
      Merge patches from Andrew Morton:
        "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: memcg: do not allow task about to OOM kill to bypass the limit
        mm: memcg: fix race condition between memcg teardown and swapin
        thp: move preallocated PTE page table on move_huge_pmd()
        mfd/rtc: s5m: fix register updating by adding regmap for RTC
        rtc: s5m: enable IRQ wake during suspend
        rtc: s5m: limit endless loop waiting for register update
        rtc: s5m: fix unsuccesful IRQ request during probe
        drivers/rtc/rtc-s5m.c: fix info->rtc assignment
        include/linux/kernel.h: make might_fault() a nop for !MMU
        drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap
        procfs: also fix proc_reg_get_unmapped_area() for !MMU case
        mm: memcg: do not declare OOM from __GFP_NOFAIL allocations
        include/linux/hugetlb.h: make isolate_huge_page() an inline
      8d276377
    • J
      mm: memcg: do not allow task about to OOM kill to bypass the limit · 1f14c1ac
      Johannes Weiner 提交于
      Commit 49426420 ("mm: memcg: handle non-error OOM situations more
      gracefully") allowed tasks that already entered a memcg OOM condition to
      bypass the memcg limit on subsequent allocation attempts hoping this
      would expedite finishing the page fault and executing the kill.
      
      David Rientjes is worried that this breaks memcg isolation guarantees
      and since there is no evidence that the bypass actually speeds up fault
      processing just change it so that these subsequent charge attempts fail
      outright.  The notable exception being __GFP_NOFAIL charges which are
      required to bypass the limit regardless.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-bt: David Rientjes <rientjes@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f14c1ac
    • J
      mm: memcg: fix race condition between memcg teardown and swapin · 96f1c58d
      Johannes Weiner 提交于
      There is a race condition between a memcg being torn down and a swapin
      triggered from a different memcg of a page that was recorded to belong
      to the exiting memcg on swapout (with CONFIG_MEMCG_SWAP extension).  The
      result is unreclaimable pages pointing to dead memcgs, which can lead to
      anything from endless loops in later memcg teardown (the page is charged
      to all hierarchical parents but is not on any LRU list) or crashes from
      following the dangling memcg pointer.
      
      Memcgs with tasks in them can not be torn down and usually charges don't
      show up in memcgs without tasks.  Swapin with the CONFIG_MEMCG_SWAP
      extension is the notable exception because it charges the cgroup that
      was recorded as owner during swapout, which may be empty and in the
      process of being torn down when a task in another memcg triggers the
      swapin:
      
        teardown:                 swapin:
      
                                  lookup_swap_cgroup_id()
                                  rcu_read_lock()
                                  mem_cgroup_lookup()
                                  css_tryget()
                                  rcu_read_unlock()
        disable css_tryget()
        call_rcu()
          offline_css()
            reparent_charges()
                                  res_counter_charge() (hierarchical!)
                                  css_put()
                                    css_free()
                                  pc->mem_cgroup = dead memcg
                                  add page to dead lru
      
      Add a final reparenting step into css_free() to make sure any such raced
      charges are moved out of the memcg before it's finally freed.
      
      In the longer term it would be cleaner to have the css_tryget() and the
      res_counter charge under the same RCU lock section so that the charge
      reparenting is deferred until the last charge whose tryget succeeded is
      visible.  But this will require more invasive changes that will be
      harder to evaluate and backport into stable, so better defer them to a
      separate change set.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96f1c58d