1. 16 5月, 2012 2 次提交
    • J
      jbd: Write journal superblock with WRITE_FUA after checkpointing · fd2cbd4d
      Jan Kara 提交于
      If journal superblock is written only in disk's caches and other transaction
      starts reusing space of the transaction cleaned from the log, it can happen
      blocks of a new transaction reach the disk before journal superblock. When
      power failure happens in such case, subsequent journal replay would still try
      to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: NJan Kara <jack@suse.cz>
      fd2cbd4d
    • J
      jbd: Split updating of journal superblock and marking journal empty · 9754e39c
      Jan Kara 提交于
      There are three case of updating journal superblock. In the first case, we want
      to mark journal as empty (setting s_sequence to 0), in the second case we want
      to update log tail, in the third case we want to update s_errno. Split these
      cases into separate functions. It makes the code slightly more straightforward
      and later patches will make the distinction even more important.
      Signed-off-by: NJan Kara <jack@suse.cz>
      9754e39c
  2. 11 4月, 2012 1 次提交
    • J
      jbd: Refine commit writeout logic · 2db938be
      Jan Kara 提交于
      Currently we write out all journal buffers in WRITE_SYNC mode. This improves
      performance for fsync heavy workloads but hinders performance when writes
      are mostly asynchronous, most noticably it slows down readers and users
      complain about slow desktop response etc.
      
      So submit writes as asynchronous in the normal case and only submit writes as
      WRITE_SYNC if we detect someone is waiting for current transaction commit.
      
      I've gathered some numbers to back this change. The first is the read latency
      test. It measures time to read 1 MB after several seconds of sleeping in
      presence of streaming writes.
      
      Top 10 times (out of 90) in us:
      Before		After
      2131586		697473
      1709932		557487
      1564598		535642
      1480462		347573
      1478579		323153
      1408496		222181
      1388960		181273
      1329565		181070
      1252486		172832
      1223265		172278
      
      Average:
      619377		82180
      
      So the improvement in both maximum and average latency is massive.
      
      I've measured fsync throughput by:
      fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4
      
      in presence of streaming reader. The numbers (fsyncs/s) are:
      Before		After
      9.9		6.3
      6.8		6.0
      6.3		6.2
      5.8		6.1
      
      So fsync performance seems unharmed by this change.
      Signed-off-by: NJan Kara <jack@suse.cz>
      2db938be
  3. 06 4月, 2012 7 次提交
  4. 04 4月, 2012 4 次提交
  5. 03 4月, 2012 2 次提交
    • P
      firewire: restore the device.h include in linux/firewire.h · f68c56b7
      Paul Gortmaker 提交于
      Commit 313162d0 ("device.h: audit and cleanup users in main include
      dir") exchanged an include <linux/device.h> for a struct *device but in
      actuality I misread this file when creating 313162d0 and it should have
      remained an include.
      
      There were no build regressions since all consumers were already getting
      device.h anyway, but make it right regardless.
      Reported-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f68c56b7
    • P
      avr32: fix build failures from mis-naming of atmel_nand.h · 3d92e051
      Paul Gortmaker 提交于
      Commit bf4289cb ("ATMEL: fix nand ecc support") indicated that it
      wanted to "Move platform data to a common header
      include/linux/platform_data/atmel_nand.h" and the new header even had
      re-include protectors with:
      
          #ifndef __ATMEL_NAND_H__
      
      However, the file that was added was simply called atmel.h
      and this caused avr32 defconfig to fail with:
      
        In file included from arch/avr32/boards/atstk1000/setup.c:22:
        arch/avr32/mach-at32ap/include/mach/board.h:10:44: error: linux/platform_data/atmel_nand.h: No such file or directory
        In file included from arch/avr32/boards/atstk1000/setup.c:22:
        arch/avr32/mach-at32ap/include/mach/board.h:121: warning: 'struct atmel_nand_data' declared inside parameter list
        arch/avr32/mach-at32ap/include/mach/board.h:121: warning: its scope is only this definition or declaration, which is probably not what you want
        make[2]: *** [arch/avr32/boards/atstk1000/setup.o] Error 1
      
      It seems the scope of the file contents will expand beyond
      just nand, so ignore the original intention, and fix up the
      users who reference the bad name with the _nand suffix.
      
      CC: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      CC: David Woodhouse <dwmw2@infradead.org>
      Acked-by: NHans-Christian Egtvedt <egtvedt@samfundet.no>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d92e051
  6. 02 4月, 2012 1 次提交
  7. 01 4月, 2012 4 次提交
  8. 31 3月, 2012 1 次提交
  9. 30 3月, 2012 7 次提交
  10. 29 3月, 2012 11 次提交
    • S
      crypto: user - Fix size of netlink dump message · 5219a534
      Steffen Klassert 提交于
      The default netlink message size limit might be exceeded when dumping a
      lot of algorithms to userspace. As a result, not all of the instantiated
      algorithms dumped to userspace. So calculate an upper bound on the message
      size and call netlink_dump_start() with that value.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      5219a534
    • A
    • R
      cpumask: remove old cpu_*_map. · 615399c8
      Rusty Russell 提交于
      These are obsolete: cpu_*_mask provides (const) pointers.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      615399c8
    • K
      radix-tree: introduce bit-optimized iterator · 78c1d784
      Konstantin Khlebnikov 提交于
      A series of radix tree cleanups, and usage of them in the core pagecache
      code.
      
      Micro-benchmark:
      
      lookup 14 slots (typical page-vector size)
      in radix-tree there earch <step> slot filled and tagged
      before/after - nsec per full scan through tree
      
      * Intel Sandy Bridge i7-2620M 4Mb L3
      New code always faster
      
      * AMD Athlon 6000+ 2x1Mb L2, without L3
      New code generally faster,
      Minor degradation (marked with "*") for huge sparse trees
      
      * i386 on Sandy Bridge
      New code faster for common cases: tagged and dense trees.
      Some degradations for non-tagged lookup on sparse trees.
      
      Ideally, there might help __ffs() analog for searching first non-zero
      long element in array, gcc sometimes cannot optimize this loop corretly.
      
      Numbers:
      
      CPU: Intel Sandy Bridge i7-2620M 4Mb L3
      
      radix-tree with 1024 slots:
      
      tagged lookup
      
      step  1      before  7156        after  3613
      step  2      before  5399        after  2696
      step  3      before  4779        after  1928
      step  4      before  4456        after  1429
      step  5      before  4292        after  1213
      step  6      before  4183        after  1052
      step  7      before  4157        after  951
      step  8      before  4016        after  812
      step  9      before  3952        after  851
      step  10     before  3937        after  732
      step  11     before  4023        after  709
      step  12     before  3872        after  657
      step  13     before  3892        after  633
      step  14     before  3720        after  591
      step  15     before  3879        after  578
      step  16     before  3561        after  513
      
      normal lookup
      
      step  1      before  4266       after  3301
      step  2      before  2695       after  2129
      step  3      before  2083       after  1712
      step  4      before  1801       after  1534
      step  5      before  1628       after  1313
      step  6      before  1551       after  1263
      step  7      before  1475       after  1185
      step  8      before  1432       after  1167
      step  9      before  1373       after  1092
      step  10     before  1339       after  1134
      step  11     before  1292       after  1056
      step  12     before  1319       after  1030
      step  13     before  1276       after  1004
      step  14     before  1256       after  987
      step  15     before  1228       after  992
      step  16     before  1247       after  999
      
      radix-tree with 1024*1024*128 slots:
      
      tagged lookup
      
      step  1      before  1086102841  after  674196409
      step  2      before  816839155   after  498138306
      step  7      before  599728907   after  240676762
      step  15     before  555729253   after  185219677
      step  63     before  606637748   after  128585664
      step  64     before  608384432   after  102945089
      step  65     before  596987114   after  123996019
      step  128    before  304459225   after  56783056
      step  256    before  158846855   after  31232481
      step  512    before  86085652    after  18950595
      step  12345  before  6517189     after  1674057
      
      normal lookup
      
      step  1      before  626064869  after  544418266
      step  2      before  418809975  after  336321473
      step  7      before  242303598  after  207755560
      step  15     before  208380563  after  176496355
      step  63     before  186854206  after  167283638
      step  64     before  176188060  after  170143976
      step  65     before  185139608  after  167487116
      step  128    before  88181865   after  86913490
      step  256    before  45733628   after  45143534
      step  512    before  24506038   after  23859036
      step  12345  before  2177425    after  2018662
      
      * AMD Athlon 6000+ 2x1Mb L2, without L3
      
      radix-tree with 1024 slots:
      
      tag-lookup
      
      step  1      before  8164        after  5379
      step  2      before  5818        after  5581
      step  3      before  4959        after  4213
      step  4      before  4371        after  3386
      step  5      before  4204        after  2997
      step  6      before  4950        after  2744
      step  7      before  4598        after  2480
      step  8      before  4251        after  2288
      step  9      before  4262        after  2243
      step  10     before  4175        after  2131
      step  11     before  3999        after  2024
      step  12     before  3979        after  1994
      step  13     before  3842        after  1929
      step  14     before  3750        after  1810
      step  15     before  3735        after  1810
      step  16     before  3532        after  1660
      
      normal-lookup
      
      step  1      before  7875        after  5847
      step  2      before  4808        after  4071
      step  3      before  4073        after  3462
      step  4      before  3677        after  3074
      step  5      before  4308        after  2978
      step  6      before  3911        after  3807
      step  7      before  3635        after  3522
      step  8      before  3313        after  3202
      step  9      before  3280        after  3257
      step  10     before  3166        after  3083
      step  11     before  3066        after  3026
      step  12     before  2985        after  2982
      step  13     before  2925        after  2924
      step  14     before  2834        after  2808
      step  15     before  2805        after  2803
      step  16     before  2647        after  2622
      
      radix-tree with 1024*1024*128 slots:
      
      tag-lookup
      
      step  1      before  1288059720  after  951736580
      step  2      before  961292300   after  884212140
      step  7      before  768905140   after  547267580
      step  15     before  771319480   after  456550640
      step  63     before  504847640   after  242704304
      step  64     before  392484800   after  177920786
      step  65     before  491162160   after  246895264
      step  128    before  208084064   after  97348392
      step  256    before  112401035   after  51408126
      step  512    before  75825834    after  29145070
      step  12345  before  5603166     after  2847330
      
      normal-lookup
      
      step  1      before  1025677120  after  861375100
      step  2      before  647220080   after  572258540
      step  7      before  505518960   after  484041813
      step  15     before  430483053   after  444815320	*
      step  63     before  388113453   after  404250546	*
      step  64     before  374154666   after  396027440	*
      step  65     before  381423973   after  396704853	*
      step  128    before  190078700   after  202619384	*
      step  256    before  100886756   after  102829108	*
      step  512    before  64074505    after  56158720
      step  12345  before  4237289     after  4422299		*
      
      * i686 on Sandy bridge
      
      radix-tree with 1024 slots:
      
      tagged lookup
      
      step  1      before  7990        after  4019
      step  2      before  5698        after  2897
      step  3      before  5013        after  2475
      step  4      before  4630        after  1721
      step  5      before  4346        after  1759
      step  6      before  4299        after  1556
      step  7      before  4098        after  1513
      step  8      before  4115        after  1222
      step  9      before  3983        after  1390
      step  10     before  4077        after  1207
      step  11     before  3921        after  1231
      step  12     before  3894        after  1116
      step  13     before  3840        after  1147
      step  14     before  3799        after  1090
      step  15     before  3797        after  1059
      step  16     before  3783        after  745
      
      normal lookup
      
      step  1      before  5103       after  3499
      step  2      before  3299       after  2550
      step  3      before  2489       after  2370
      step  4      before  2034       after  2302		*
      step  5      before  1846       after  2268		*
      step  6      before  1752       after  2249		*
      step  7      before  1679       after  2164		*
      step  8      before  1627       after  2153		*
      step  9      before  1542       after  2095		*
      step  10     before  1479       after  2109		*
      step  11     before  1469       after  2009		*
      step  12     before  1445       after  2039		*
      step  13     before  1411       after  2013		*
      step  14     before  1374       after  2046		*
      step  15     before  1340       after  1975		*
      step  16     before  1331       after  2000		*
      
      radix-tree with 1024*1024*128 slots:
      
      tagged lookup
      
      step  1      before  1225865377  after  667153553
      step  2      before  842427423   after  471533007
      step  7      before  609296153   after  276260116
      step  15     before  544232060   after  226859105
      step  63     before  519209199   after  141343043
      step  64     before  588980279   after  141951339
      step  65     before  521099710   after  138282060
      step  128    before  298476778   after  83390628
      step  256    before  149358342   after  43602609
      step  512    before  76994713    after  22911077
      step  12345  before  53286669     after  1472111
      
      normal lookup
      
      step  1      before  819284564  after  533635310
      step  2      before  512421605  after  364956155
      step  7      before  271443305  after  305721345	*
      step  15     before  223591630  after  273960216	*
      step  63     before  190320247  after  217770207	*
      step  64     before  178538168  after  267411372	*
      step  65     before  186400423  after  215347937	*
      step  128    before  88106045   after  140540612	*
      step  256    before  44812420   after  70660377		*
      step  512    before  24435438   after  36328275		*
      step  12345  before  2123924    after  2148062		*
      
      bloat-o-meter delta for this patchset + patchset with related shmem cleanups
      
      bloat-o-meter: x86_64
      
      add/remove: 4/3 grow/shrink: 5/6 up/down: 928/-939 (-11)
      function                                     old     new   delta
      radix_tree_next_chunk                          -     499    +499
      shmem_unuse                                  428     554    +126
      shmem_radix_tree_replace                     131     227     +96
      find_get_pages_tag                           354     419     +65
      find_get_pages_contig                        345     407     +62
      find_get_pages                               362     396     +34
      __kstrtab_radix_tree_next_chunk                -      22     +22
      __ksymtab_radix_tree_next_chunk                -      16     +16
      __kcrctab_radix_tree_next_chunk                -       8      +8
      radix_tree_gang_lookup_slot                  204     203      -1
      static.shmem_xattr_set                       384     381      -3
      radix_tree_gang_lookup_tag_slot              208     191     -17
      radix_tree_gang_lookup                       231     187     -44
      radix_tree_gang_lookup_tag                   247     199     -48
      shmem_unlock_mapping                         278     190     -88
      __lookup                                     217       -    -217
      __lookup_tag                                 242       -    -242
      radix_tree_locate_item                       279       -    -279
      
      bloat-o-meter: i386
      
      add/remove: 3/3 grow/shrink: 8/9 up/down: 1075/-1275 (-200)
      function                                     old     new   delta
      radix_tree_next_chunk                          -     757    +757
      shmem_unuse                                  352     449     +97
      find_get_pages_contig                        269     322     +53
      shmem_radix_tree_replace                     113     154     +41
      find_get_pages_tag                           277     318     +41
      dcache_dir_lseek                             426     458     +32
      __kstrtab_radix_tree_next_chunk                -      22     +22
      vc_do_resize                                 968     977      +9
      snd_pcm_lib_read1                            725     733      +8
      __ksymtab_radix_tree_next_chunk                -       8      +8
      netlbl_cipsov4_list                         1120    1127      +7
      find_get_pages                               293     291      -2
      new_slab                                     467     459      -8
      bitfill_unaligned_rev                        425     417      -8
      radix_tree_gang_lookup_tag_slot              177     146     -31
      blk_dump_cmd                                 267     229     -38
      radix_tree_gang_lookup_slot                  212     134     -78
      shmem_unlock_mapping                         221     128     -93
      radix_tree_gang_lookup_tag                   275     162    -113
      radix_tree_gang_lookup                       255     126    -129
      __lookup                                     227       -    -227
      __lookup_tag                                 271       -    -271
      radix_tree_locate_item                       277       -    -277
      
      This patch:
      
      Implement a clean, simple and effective radix-tree iteration routine.
      
      Iterating divided into two phases:
      * lookup next chunk in radix-tree leaf node
      * iterating through slots in this chunk
      
      Main iterator function radix_tree_next_chunk() returns pointer to first
      slot, and stores in the struct radix_tree_iter index of next-to-last slot.
       For tagged-iterating it also constuct bitmask of tags for retunted chunk.
       All additional logic implemented as static-inline functions and macroses.
      
      Also adds radix_tree_find_next_bit() static-inline variant of
      find_next_bit() optimized for small constant size arrays, because
      find_next_bit() too heavy for searching in an array with one/two long
      elements.
      
      [akpm@linux-foundation.org: rework comments a bit]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Tested-by: NHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78c1d784
    • D
      pidns: add reboot_pid_ns() to handle the reboot syscall · cf3f8921
      Daniel Lezcano 提交于
      In the case of a child pid namespace, rebooting the system does not really
      makes sense.  When the pid namespace is used in conjunction with the other
      namespaces in order to create a linux container, the reboot syscall leads
      to some problems.
      
      A container can reboot the host.  That can be fixed by dropping the
      sys_reboot capability but we are unable to correctly to poweroff/
      halt/reboot a container and the container stays stuck at the shutdown time
      with the container's init process waiting indefinitively.
      
      After several attempts, no solution from userspace was found to reliabily
      handle the shutdown from a container.
      
      This patch propose to make the init process of the child pid namespace to
      exit with a signal status set to : SIGINT if the child pid namespace
      called "halt/poweroff" and SIGHUP if the child pid namespace called
      "reboot".  When the reboot syscall is called and we are not in the initial
      pid namespace, we kill the pid namespace for "HALT", "POWEROFF",
      "RESTART", and "RESTART2".  Otherwise we return EINVAL.
      
      Returning EINVAL is also an easy way to check if this feature is supported
      by the kernel when invoking another 'reboot' option like CAD.
      
      By this way the parent process of the child pid namespace knows if it
      rebooted or not and can take the right decision.
      
      Test case:
      ==========
      
      #include <alloca.h>
      #include <stdio.h>
      #include <sched.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/reboot.h>
      #include <sys/types.h>
      #include <sys/wait.h>
      
      #include <linux/reboot.h>
      
      static int do_reboot(void *arg)
      {
              int *cmd = arg;
      
              if (reboot(*cmd))
                      printf("failed to reboot(%d): %m\n", *cmd);
      }
      
      int test_reboot(int cmd, int sig)
      {
              long stack_size = 4096;
              void *stack = alloca(stack_size) + stack_size;
              int status;
              pid_t ret;
      
              ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd);
              if (ret < 0) {
                      printf("failed to clone: %m\n");
                      return -1;
              }
      
              if (wait(&status) < 0) {
                      printf("unexpected wait error: %m\n");
                      return -1;
              }
      
              if (!WIFSIGNALED(status)) {
                      printf("child process exited but was not signaled\n");
                      return -1;
              }
      
              if (WTERMSIG(status) != sig) {
                      printf("signal termination is not the one expected\n");
                      return -1;
              }
      
              return 0;
      }
      
      int main(int argc, char *argv[])
      {
              int status;
      
              status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP);
              if (status < 0)
                      return 1;
              printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n");
      
              status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP);
              if (status < 0)
                      return 1;
              printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n");
      
              status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT);
              if (status < 0)
                      return 1;
              printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n");
      
              status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT);
              if (status < 0)
                      return 1;
              printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n");
      
              status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1);
              if (status >= 0) {
                      printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n");
                      return 1;
              }
              printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n");
      
              return 0;
      }
      
      [akpm@linux-foundation.org: tweak and add comments]
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Tested-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf3f8921
    • S
      lib/cpumask.c: remove __any_online_cpu() · 38b93780
      Srivatsa S. Bhat 提交于
      __any_online_cpu() is not optimal and also unnecessary.  So, replace its
      use by faster cpumask_* operations.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38b93780
    • G
      smp: add func to IPI cpus based on parameter func · b3a7e98e
      Gilad Ben-Yossef 提交于
      Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
      calculates the cpumask of cpus to IPI by calling a function supplied as a
      parameter in order to determine whether to IPI each specific cpu.
      
      The function works around allocation failure of cpumask variable in
      CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
      via smp_call_function_single().
      
      The function is useful since it allows to seperate the specific code that
      decided in each case whether to IPI a specific cpu for a specific request
      from the common boilerplate code of handling creating the mask, handling
      failures etc.
      
      [akpm@linux-foundation.org: s/gfpflags/gfp_flags/]
      [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
      [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment]
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Avi Kivity <avi@redhat.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.org>
      Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com>
      Cc: Milton Miller <miltonm@bga.com>
      Reviewed-by: N"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3a7e98e
    • G
      smp: introduce a generic on_each_cpu_mask() function · 3fc498f1
      Gilad Ben-Yossef 提交于
      We have lots of infrastructure in place to partition multi-core systems
      such that we have a group of CPUs that are dedicated to specific task:
      cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
      Still, kernel code will at times interrupt all CPUs in the system via IPIs
      for various needs.  These IPIs are useful and cannot be avoided
      altogether, but in certain cases it is possible to interrupt only specific
      CPUs that have useful work to do and not the entire system.
      
      This patch set, inspired by discussions with Peter Zijlstra and Frederic
      Weisbecker when testing the nohz task patch set, is a first stab at trying
      to explore doing this by locating the places where such global IPI calls
      are being made and turning the global IPI into an IPI for a specific group
      of CPUs.  The purpose of the patch set is to get feedback if this is the
      right way to go for dealing with this issue and indeed, if the issue is
      even worth dealing with at all.  Based on the feedback from this patch set
      I plan to offer further patches that address similar issue in other code
      paths.
      
      This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
      infrastructure API (the former derived from existing arch specific
      versions in Tile and Arm) and uses them to turn several global IPI
      invocation to per CPU group invocations.
      
      Core kernel:
      
      on_each_cpu_mask() calls a function on processors specified by cpumask,
      which may or may not include the local processor.
      
      You must not call this function with disabled interrupts or from a
      hardware interrupt handler or from a bottom half handler.
      
      arch/arm:
      
      Note that the generic version is a little different then the Arm one:
      
      1. It has the mask as first parameter
      2. It calls the function on the calling CPU with interrupts disabled,
         but this should be OK since the function is called on the other CPUs
         with interrupts disabled anyway.
      
      arch/tile:
      
      The API is the same as the tile private one, but the generic version
      also calls the function on the with interrupts disabled in UP case
      
      This is OK since the function is called on the other CPUs
      with interrupts disabled.
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Avi Kivity <avi@redhat.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.org>
      Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com>
      Cc: Milton Miller <miltonm@bga.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fc498f1
    • H
      swapon: check validity of swap_flags · d15cab97
      Hugh Dickins 提交于
      Most system calls taking flags first check that the flags passed in are
      valid, and that helps userspace to detect when new flags are supported.
      
      But swapon never did so: start checking now, to help if we ever want to
      support more swap_flags in future.
      
      It's difficult to get stray bits set in an int, and swapon is not widely
      used, so this is most unlikely to break any userspace; but we can just
      revert if it turns out to do so.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d15cab97
    • H
      mm for fs: add truncate_pagecache_range() · 623e3db9
      Hugh Dickins 提交于
      Holepunching filesystems ext4 and xfs are using truncate_inode_pages_range
      but forgetting to unmap pages first (ocfs2 remembers).  This is not really
      a bug, since races already require truncate_inode_page() to handle that
      case once the page is locked; but it can be very inefficient if the file
      being punched happens to be mapped into many vmas.
      
      Provide a drop-in replacement truncate_pagecache_range() which does the
      unmapping pass first, handling the awkward mismatch between arguments to
      truncate_inode_pages_range() and arguments to unmap_mapping_range().
      
      Note that holepunching does not unmap privately COWed pages in the range:
      POSIX requires that we do so when truncating, but it's hard to justify,
      difficult to implement without an i_size cutoff, and no filesystem is
      attempting to implement it.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      623e3db9
    • M
      PM / QoS: add pm_qos_update_request_timeout() API · c4772d19
      MyungJoo Ham 提交于
      The new API, pm_qos_update_request_timeout() is to provide a timeout
      with pm_qos_update_request.
      
      For example, pm_qos_update_request_timeout(req, 100, 1000), means that
      QoS request on req with value 100 will be active for 1000 microseconds.
      After 1000 microseconds, the QoS request thru req is reset. If there
      were another pm_qos_update_request(req, x) during the 1000 us, this
      new request with value x will override as this is another request on the
      same req handle. A new request on the same req handle will always
      override the previous request whether it is the conventional request or
      it is the new timeout request.
      Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Acked-by: NMark Gross <markgross@thegnar.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      c4772d19