1. 09 8月, 2014 13 次提交
    • Y
      initramfs: support initramfs that is bigger than 2GiB · d97b07c5
      Yinghai Lu 提交于
      Now with 64bit bzImage and kexec tools, we support ramdisk that size is
      bigger than 2g, as we could put it above 4G.
      
      Found compressed initramfs image could not be decompressed properly.  It
      turns out that image length is int during decompress detection, and it
      will become < 0 when length is more than 2G.  Furthermore, during
      decompressing len as int is used for inbuf count, that has problem too.
      
      Change len to long, that should be ok as on 32 bit platform long is
      32bits.
      
      Tested with following compressed initramfs image as root with kexec.
      	gzip, bzip2, xz, lzma, lzop, lz4.
      run time for populate_rootfs():
         size        name       Nehalem-EX  Westmere-EX  Ivybridge-EX
       9034400256 root_img     :   26s           24s          30s
       3561095057 root_img.lz4 :   28s           27s          27s
       3459554629 root_img.lzo :   29s           29s          28s
       3219399480 root_img.gz  :   64s           62s          49s
       2251594592 root_img.xz  :  262s          260s         183s
       2226366598 root_img.lzma:  386s          376s         277s
       2901482513 root_img.bz2 :  635s          599s
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Rashika Kheria <rashika.kheria@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kyungsik Lee <kyungsik.lee@lge.com>
      Cc: P J P <ppandit@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: "Daniel M. Weeks" <dan@danweeks.net>
      Cc: Alexandre Courbot <acourbot@nvidia.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d97b07c5
    • J
      panic: add TAINT_SOFTLOCKUP · 69361eef
      Josh Hunt 提交于
      This taint flag will be set if the system has ever entered a softlockup
      state.  Similar to TAINT_WARN it is useful to know whether or not the
      system has been in a softlockup state when debugging.
      
      [akpm@linux-foundation.org: apply the taint before calling panic()]
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69361eef
    • J
      sysctl: remove typedef ctl_table · e5eea098
      Joe Perches 提交于
      Remove the final user, and the typedef itself.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5eea098
    • A
      rapidio: add new RapidIO DMA interface routines · 4aff1ce7
      Alexandre Bounine 提交于
      Add RapidIO DMA interface routines that directly use reference to the mport
      device object and/or target device destination ID as parameters.
      This allows to perform RapidIO DMA transfer requests by modules that do not
      have an access to the RapidIO device list.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4aff1ce7
    • V
      kernel/fork.c: make mm_init_owner static · 33144e84
      Vladimir Davydov 提交于
      It's only used in fork.c:mm_init().
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33144e84
    • V
      fork/exec: cleanup mm initialization · 41f727fd
      Vladimir Davydov 提交于
      mm initialization on fork/exec is spread all over the place, which makes
      the code look inconsistent.
      
      We have mm_init(), which is supposed to init/nullify mm's internals, but
      it doesn't init all the fields it should:
      
       - on fork ->mmap,mm_rb,vmacache_seqnum,map_count,mm_cpumask,locked_vm
         are zeroed in dup_mmap();
      
       - on fork ->pmd_huge_pte is zeroed in dup_mm(), immediately before
         calling mm_init();
      
       - ->cpu_vm_mask_var ptr is initialized by mm_init_cpumask(), which is
         called before mm_init() on both fork and exec;
      
       - ->context is initialized by init_new_context(), which is called after
         mm_init() on both fork and exec;
      
      Let's consolidate all the initializations in mm_init() to make the code
      look cleaner.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41f727fd
    • F
      proc: constify seq_operations · ccf94f1b
      Fabian Frederick 提交于
      proc_uid_seq_operations, proc_gid_seq_operations and
      proc_projid_seq_operations are only called in proc_id_map_open with
      seq_open as const struct seq_operations so we can constify the 3
      structures and update proc_id_map_open prototype.
      
         text    data     bss     dec     hex filename
         6817     404    1984    9205    23f5 kernel/user_namespace.o-before
         6913     308    1984    9205    23f5 kernel/user_namespace.o-after
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ccf94f1b
    • G
      memcg: remove lookup_cgroup_page() prototype · 3cbb0187
      Greg Thelen 提交于
      Commit 6b208e3f ("mm: memcg: remove unused node/section info from
      pc->flags") deleted lookup_cgroup_page() but left a prototype for it.
      
      Kill the vestigial prototype.
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3cbb0187
    • V
      page-cgroup: get rid of NR_PCG_FLAGS · 9a3f4d85
      Vladimir Davydov 提交于
      It's not used anywhere today, so let's remove it.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a3f4d85
    • V
      page-cgroup: trivial cleanup · 434584fe
      Vladimir Davydov 提交于
      Add forward declarations for struct pglist_data, mem_cgroup.
      
      Remove __init, __meminit from function prototypes and inline functions.
      
      Remove redundant inclusion of bit_spinlock.h.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      434584fe
    • J
      mm: memcontrol: use page lists for uncharge batching · 747db954
      Johannes Weiner 提交于
      Pages are now uncharged at release time, and all sources of batched
      uncharges operate on lists of pages.  Directly use those lists, and
      get rid of the per-task batching state.
      
      This also batches statistics accounting, in addition to the res
      counter charges, to reduce IRQ-disabling and re-enabling.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      747db954
    • J
      mm: memcontrol: rewrite uncharge API · 0a31bc97
      Johannes Weiner 提交于
      The memcg uncharging code that is involved towards the end of a page's
      lifetime - truncation, reclaim, swapout, migration - is impressively
      complicated and fragile.
      
      Because anonymous and file pages were always charged before they had their
      page->mapping established, uncharges had to happen when the page type
      could still be known from the context; as in unmap for anonymous, page
      cache removal for file and shmem pages, and swap cache truncation for swap
      pages.  However, these operations happen well before the page is actually
      freed, and so a lot of synchronization is necessary:
      
      - Charging, uncharging, page migration, and charge migration all need
        to take a per-page bit spinlock as they could race with uncharging.
      
      - Swap cache truncation happens during both swap-in and swap-out, and
        possibly repeatedly before the page is actually freed.  This means
        that the memcg swapout code is called from many contexts that make
        no sense and it has to figure out the direction from page state to
        make sure memory and memory+swap are always correctly charged.
      
      - On page migration, the old page might be unmapped but then reused,
        so memcg code has to prevent untimely uncharging in that case.
        Because this code - which should be a simple charge transfer - is so
        special-cased, it is not reusable for replace_page_cache().
      
      But now that charged pages always have a page->mapping, introduce
      mem_cgroup_uncharge(), which is called after the final put_page(), when we
      know for sure that nobody is looking at the page anymore.
      
      For page migration, introduce mem_cgroup_migrate(), which is called after
      the migration is successful and the new page is fully rmapped.  Because
      the old page is no longer uncharged after migration, prevent double
      charges by decoupling the page's memcg association (PCG_USED and
      pc->mem_cgroup) from the page holding an actual charge.  The new bits
      PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
      to the new page during migration.
      
      mem_cgroup_migrate() is suitable for replace_page_cache() as well,
      which gets rid of mem_cgroup_replace_page_cache().  However, care
      needs to be taken because both the source and the target page can
      already be charged and on the LRU when fuse is splicing: grab the page
      lock on the charge moving side to prevent changing pc->mem_cgroup of a
      page under migration.  Also, the lruvecs of both pages change as we
      uncharge the old and charge the new during migration, and putback may
      race with us, so grab the lru lock and isolate the pages iff on LRU to
      prevent races and ensure the pages are on the right lruvec afterward.
      
      Swap accounting is massively simplified: because the page is no longer
      uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
      transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
      before the final put_page() in page reclaim.
      
      Finally, page_cgroup changes are now protected by whatever protection the
      page itself offers: anonymous pages are charged under the page table lock,
      whereas page cache insertions, swapin, and migration hold the page lock.
      Uncharging happens under full exclusion with no outstanding references.
      Charging and uncharging also ensure that the page is off-LRU, which
      serializes against charge migration.  Remove the very costly page_cgroup
      lock and set pc->flags non-atomically.
      
      [mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
      [vdavydov@parallels.com: fix flags definition]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Tested-by: NJet Chen <jet.chen@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Tested-by: NFelipe Balbi <balbi@ti.com>
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a31bc97
    • J
      mm: memcontrol: rewrite charge API · 00501b53
      Johannes Weiner 提交于
      These patches rework memcg charge lifetime to integrate more naturally
      with the lifetime of user pages.  This drastically simplifies the code and
      reduces charging and uncharging overhead.  The most expensive part of
      charging and uncharging is the page_cgroup bit spinlock, which is removed
      entirely after this series.
      
      Here are the top-10 profile entries of a stress test that reads a 128G
      sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
       executing in the root memcg).  Before:
      
          15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.31%              cat  [kernel.kallsyms]   [k] memset
          11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
           4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.38%              cat  [kernel.kallsyms]   [k] put_page
           2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
           2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
           1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
      
      After:
      
          15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.48%           cat  [kernel.kallsyms]   [k] memset
          11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
           3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.46%           cat  [kernel.kallsyms]   [k] put_page
           2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
           1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
           1.30%           cat  [kernel.kallsyms]   [k] kfree
      
      As you can see, the memcg footprint has shrunk quite a bit.
      
         text    data     bss     dec     hex filename
        37970    9892     400   48262    bc86 mm/memcontrol.o.old
        35239    9892     400   45531    b1db mm/memcontrol.o
      
      This patch (of 4):
      
      The memcg charge API charges pages before they are rmapped - i.e.  have an
      actual "type" - and so every callsite needs its own set of charge and
      uncharge functions to know what type is being operated on.  Worse,
      uncharge has to happen from a context that is still type-specific, rather
      than at the end of the page's lifetime with exclusive access, and so
      requires a lot of synchronization.
      
      Rewrite the charge API to provide a generic set of try_charge(),
      commit_charge() and cancel_charge() transaction operations, much like
      what's currently done for swap-in:
      
        mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
        pages from the memcg if necessary.
      
        mem_cgroup_commit_charge() commits the page to the charge once it
        has a valid page->mapping and PageAnon() reliably tells the type.
      
        mem_cgroup_cancel_charge() aborts the transaction.
      
      This reduces the charge API and enables subsequent patches to
      drastically simplify uncharging.
      
      As pages need to be committed after rmap is established but before they
      are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
      additions again.  Revive lru_cache_add_active_or_unevictable().
      
      [hughd@google.com: fix shmem_unuse]
      [hughd@google.com: Add comments on the private use of -EAGAIN]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00501b53
  2. 07 8月, 2014 27 次提交