1. 25 9月, 2012 5 次提交
  2. 21 9月, 2012 1 次提交
    • M
      xfrm_user: ensure user supplied esn replay window is valid · ecd79187
      Mathias Krause 提交于
      The current code fails to ensure that the netlink message actually
      contains as many bytes as the header indicates. If a user creates a new
      state or updates an existing one but does not supply the bytes for the
      whole ESN replay window, the kernel copies random heap bytes into the
      replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
      netlink attribute. This leads to following issues:
      
      1. The replay window has random bits set confusing the replay handling
         code later on.
      
      2. A malicious user could use this flaw to leak up to ~3.5kB of heap
         memory when she has access to the XFRM netlink interface (requires
         CAP_NET_ADMIN).
      
      Known users of the ESN replay window are strongSwan and Steffen's
      iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter
      uses the interface with a bitmap supplied while the former does not.
      strongSwan is therefore prone to run into issue 1.
      
      To fix both issues without breaking existing userland allow using the
      XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
      fully specified one. For the former case we initialize the in-kernel
      bitmap with zero, for the latter we copy the user supplied bitmap. For
      state updates the full bitmap must be supplied.
      
      To prevent overflows in the bitmap length calculation the maximum size
      of bmp_len is limited to 128 by this patch -- resulting in a maximum
      replay window of 4096 packets. This should be sufficient for all real
      life scenarios (RFC 4303 recommends a default replay window size of 64).
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Martin Willi <martin@revosec.ch>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecd79187
  3. 19 9月, 2012 2 次提交
    • G
      linux/kernel.h: Fix warning seen with W=1 due to change in DIV_ROUND_CLOSEST · 263a523d
      Guenter Roeck 提交于
      After commit b6d86d3d (Fix DIV_ROUND_CLOSEST to support negative dividends),
      the following warning is seen if the kernel is compiled with W=1 (-Wextra):
      
      warning: comparison of unsigned expression >= 0 is always true
      
      The warning is due to the test '((typeof(x))-1) >= 0', which is used to detect
      if the variable type is unsigned. Research on the web suggests that the warning
      disappears if '>' instead of '>=' is used for the comparison.
      
      Tests after changing the macro along that line show that the warning is gone,
      and that the result is still correct:
      
      i=-4: DIV_ROUND_CLOSEST(i, 2)=-2
      i=-3: DIV_ROUND_CLOSEST(i, 2)=-2
      i=-2: DIV_ROUND_CLOSEST(i, 2)=-1
      i=-1: DIV_ROUND_CLOSEST(i, 2)=-1
      i=0: DIV_ROUND_CLOSEST(i, 2)=0
      i=1: DIV_ROUND_CLOSEST(i, 2)=1
      i=2: DIV_ROUND_CLOSEST(i, 2)=1
      i=3: DIV_ROUND_CLOSEST(i, 2)=2
      i=4: DIV_ROUND_CLOSEST(i, 2)=2
      
      Code size is the same as before.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Acked-by: NJean Delvare <khali@linux-fr.org>
      263a523d
    • M
      vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() · b161dfa6
      Miklos Szeredi 提交于
      IBM reported a soft lockup after applying the fix for the rename_lock
      deadlock.  Commit c83ce989 ("VFS: Fix the nfs sillyrename regression
      in kernel 2.6.38") was found to be the culprit.
      
      The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
      dentry was killed.  This flag can be set on non-killed dentries too,
      which results in infinite retries when trying to traverse the dentry
      tree.
      
      This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
      only set in d_kill() and makes try_to_ascend() test only this flag.
      
      IBM reported successful test results with this patch.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b161dfa6
  4. 18 9月, 2012 2 次提交
    • A
      compiler.h: add __visible · 9a858dc7
      Andi Kleen 提交于
      gcc 4.6+ has support for a externally_visible attribute that prevents the
      optimizer from optimizing unused symbols away.  Add a __visible macro to
      use it with that compiler version or later.
      
      This is used (at least) by the "Link Time Optimization" patchset.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a858dc7
    • J
      mm/ia64: fix a memory block size bug · 05cf9639
      Jianguo Wu 提交于
      I found following definition in include/linux/memory.h, in my IA64
      platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE
      will be 0.
      
        #define MIN_MEMORY_BLOCK_SIZE     (1 << SECTION_SIZE_BITS)
      
      Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits,
      so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0.
      Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong.
      This will cause wrong system memory infomation in sysfs.
      I think it should be:
      
        #define MIN_MEMORY_BLOCK_SIZE     (1UL << SECTION_SIZE_BITS)
      
      And "echo offline > memory0/state" will cause following call trace:
      
        kernel BUG at mm/memory_hotplug.c:885!
        sh[6455]: bugcheck! 0 [1]
        Pid: 6455, CPU 0, comm:                   sh
        psr : 0000101008526030 ifs : 8000000000000fa4 ip  : [<a0000001008c40f0>]    Not tainted (3.6.0-rc1)
        ip is at offline_pages+0x210/0xee0
        Call Trace:
          show_stack+0x80/0xa0
          show_regs+0x640/0x920
          die+0x190/0x2c0
          die_if_kernel+0x50/0x80
          ia64_bad_break+0x3d0/0x6e0
          ia64_native_leave_kernel+0x0/0x270
          offline_pages+0x210/0xee0
          alloc_pages_current+0x180/0x2a0
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05cf9639
  5. 17 9月, 2012 1 次提交
  6. 16 9月, 2012 1 次提交
    • M
      mfd: core: Push irqdomain mapping out into devices · 0848c94f
      Mark Brown 提交于
      Currently the MFD core supports remapping MFD cell interrupts using an
      irqdomain but only if the MFD is being instantiated using device tree
      and only if the device tree bindings use the pattern of registering IPs
      in the device tree with compatible properties.  This will be actively
      harmful for drivers which support non-DT platforms and use this pattern
      for their DT bindings as it will mean that the core will silently change
      remapping behaviour and it is also limiting for drivers which don't do
      DT with this particular pattern.  There is also a potential fragility if
      there are interrupts not associated with MFD cells and all the cells are
      omitted from the device tree for some reason.
      
      Instead change the code to take an IRQ domain as an optional argument,
      allowing drivers to take the decision about the parent domain for their
      interrupts.  The one current user of this feature is ab8500-core, it has
      the domain lookup pushed out into the driver.
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
      0848c94f
  7. 14 9月, 2012 1 次提交
  8. 13 9月, 2012 1 次提交
  9. 12 9月, 2012 1 次提交
    • R
      i2c: pnx: Fix read transactions of >= 2 bytes · c076ada4
      Roland Stigge 提交于
      On transactions with n>=2 bytes, the controller actually wrongly clocks in n+1
      bytes. This is caused by the (wrong) assumption that RFE in the Status Register
      is 1 iff there is no byte already ordered (via a dummy TX byte). This lead to
      the implementation of synchronized byte ordering, e.g.:
      
      Dummy-TX - RX - Dummy-TX - RX - ...
      
      But since RFE actually stays high after some Dummy-TX, it rather looks like:
      
      Dummy-TX - Dummy-TX - RX - Dummy-TX - RX - (RX)
      
      The last RX byte is clocked in by the bus controller, but ignored by the kernel
      when filling the userspace buffer.
      
      This patch fixes the issue by asking for RX via Dummy-TX asynchronously.
      Introducing a separate counter for TX bytes.
      Signed-off-by: NRoland Stigge <stigge@antcom.de>
      Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>
      c076ada4
  10. 10 9月, 2012 3 次提交
  11. 08 9月, 2012 2 次提交
  12. 07 9月, 2012 2 次提交
    • T
      SUNRPC: Fix a UDP transport regression · f39c1bfb
      Trond Myklebust 提交于
      Commit 43cedbf0 (SUNRPC: Ensure that
      we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
      hangs in the case of NFS over UDP mounts.
      
      Since neither the UDP or the RDMA transport mechanism use dynamic slot
      allocation, we can skip grabbing the socket lock for those transports.
      Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
      case.
      
      Note that the NFSv4.1 back channel assigns the slot directly
      through rpc_run_bc_task, so we can ignore that case.
      Reported-by: NDick Streefland <dick.streefland@altium.nl>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org [>= 3.1]
      f39c1bfb
    • B
      kobject: fix oops with "input0: bad kobj_uevent_env content in show_uevent()" · 60e233a5
      Bjørn Mork 提交于
      Fengguang Wu <fengguang.wu@intel.com> writes:
      
      > After the __devinit* removal series, I can still get kernel panic in
      > show_uevent(). So there are more sources of bug..
      >
      > Debug patch:
      >
      > @@ -343,8 +343,11 @@ static ssize_t show_uevent(struct device
      >                 goto out;
      >
      >         /* copy keys to file */
      > -       for (i = 0; i < env->envp_idx; i++)
      > +       dev_err(dev, "uevent %d env[%d]: %s/.../%s\n", env->buflen, env->envp_idx, top_kobj->name, dev->kobj.name);
      > +       for (i = 0; i < env->envp_idx; i++) {
      > +               printk(KERN_ERR "uevent %d env[%d]: %s\n", (int)count, i, env->envp[i]);
      >                 count += sprintf(&buf[count], "%s\n", env->envp[i]);
      > +       }
      >
      > Oops message, the env[] is again not properly initilized:
      >
      > [   44.068623] input input0: uevent 61 env[805306368]: input0/.../input0
      > [   44.069552] uevent 0 env[0]: (null)
      
      This is a completely different CONFIG_HOTPLUG problem, only
      demonstrating another reason why CONFIG_HOTPLUG should go away.  I had a
      hard time trying to disable it anyway ;-)
      
      The problem this time is lots of code assuming that a call to
      add_uevent_var() will guarantee that env->buflen > 0.  This is not true
      if CONFIG_HOTPLUG is unset.  So things like this end up overwriting
      env->envp_idx because the array index is -1:
      
      	if (add_uevent_var(env, "MODALIAS="))
      		return -ENOMEM;
              len = input_print_modalias(&env->buf[env->buflen - 1],
      				   sizeof(env->buf) - env->buflen,
      				   dev, 0);
      
      Don't know what the best action is, given that there seem to be a *lot*
      of this around the kernel.  This patch "fixes" the problem for me, but I
      don't know if it can be considered an appropriate fix.
      
      [ It is the correct fix for now, for 3.7 forcing CONFIG_HOTPLUG to
      always be on is the longterm fix, but it's too late for 3.6 and older
      kernels to resolve this that way - gregkh ]
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60e233a5
  13. 06 9月, 2012 2 次提交
  14. 05 9月, 2012 2 次提交
  15. 04 9月, 2012 2 次提交
  16. 02 9月, 2012 2 次提交
  17. 01 9月, 2012 1 次提交
  18. 31 8月, 2012 2 次提交
  19. 29 8月, 2012 3 次提交
  20. 23 8月, 2012 1 次提交
    • A
      ARM: omap: allow building omap44xx without SMP · c7a9b09b
      Arnd Bergmann 提交于
      The new omap4 cpuidle implementation currently requires
      ARCH_NEEDS_CPU_IDLE_COUPLED, which only works on SMP.
      
      This patch makes it possible to build a non-SMP kernel
      for that platform. This is not normally desired for
      end-users but can be useful for testing.
      
      Without this patch, building rand-0y2jSKT results in:
      
      drivers/cpuidle/coupled.c: In function 'cpuidle_coupled_poke':
      drivers/cpuidle/coupled.c:317:3: error: implicit declaration of function '__smp_call_function_single' [-Werror=implicit-function-declaration]
      
      It's not clear if this patch is the best solution for
      the problem at hand. I have made sure that we can now
      build the kernel in all configurations, but that does
      not mean it will actually work on an OMAP44xx.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Tony Lindgren <tony@atomide.com>
      c7a9b09b
  21. 22 8月, 2012 3 次提交
    • A
      introduce kref_put_mutex() · 8ad5db8a
      Al Viro 提交于
      equivalent of
      	mutex_lock(mutex);
      	if (!kref_put(kref, release))
      		mutex_unlock(mutex);
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8ad5db8a
    • A
      mfd: Move tps65217 regulator plat data handling to regulator · 1922b0f2
      AnilKumar Ch 提交于
      Regulator platform data handling was mistakenly added to MFD
      driver. So we will see build errors if we compile MFD drivers
      without CONFIG_REGULATOR. This patch moves regulator platform
      data handling from TPS65217 MFD driver to regulator driver.
      
      This makes MFD driver independent of REGULATOR framework so
      build error is fixed if CONFIG_REGULATOR is not set.
      
      drivers/built-in.o: In function `tps65217_probe':
      tps65217.c:(.devinit.text+0x13e37): undefined reference
      to `of_regulator_match'
      
      This patch also fix allocation size of tps65217 platform data.
      Current implementation allocates a struct tps65217_board for each
      regulator specified in the device tree. But the structure itself
      provides array of regulators so one instance of it is sufficient.
      Signed-off-by: NAnilKumar Ch <anilkumar@ti.com>
      1922b0f2
    • M
      mm: compaction: Abort async compaction if locks are contended or taking too long · c67fe375
      Mel Gorman 提交于
      Jim Schutt reported a problem that pointed at compaction contending
      heavily on locks.  The workload is straight-forward and in his own words;
      
      	The systems in question have 24 SAS drives spread across 3 HBAs,
      	running 24 Ceph OSD instances, one per drive.  FWIW these servers
      	are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
      	Ceph Linux clients doing dd simultaneously to a Ceph file system
      	backed by 12 of these servers.
      
      Early in the test everything looks fine
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
        27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
        28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
         6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
        22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0
      
      and then it goes to pot
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        163  8          0     464308        576   36791368    0    0    11 22210  866  536   3 13  79  4  0
        207 14          0     917752        576   36181928    0    0   712 1345376 134598 47367   7 90   1  2  0
        123 12          0     685516        576   36296148    0    0   429 1386615 158494 60077   8 84   5  3  0
        123 12          0     598572        576   36333728    0    0  1107 1233281 147542 62351   7 84   5  4  0
        622  7          0     660768        576   36118264    0    0   557 1345548 151394 59353   7 85   4  3  0
        223 11          0     283960        576   36463868    0    0    46 1107160 121846 33006   6 93   1  1  0
      
      Note that system CPU usage is very high blocks being written out has
      dropped by 42%. He analysed this with perf and found
      
        perf record -g -a sleep 10
        perf report --sort symbol --call-graph fractal,5
          34.63%  [k] _raw_spin_lock_irqsave
                  |
                  |--97.30%-- isolate_freepages
                  |          compaction_alloc
                  |          unmap_and_move
                  |          migrate_pages
                  |          compact_zone
                  |          compact_zone_order
                  |          try_to_compact_pages
                  |          __alloc_pages_direct_compact
                  |          __alloc_pages_slowpath
                  |          __alloc_pages_nodemask
                  |          alloc_pages_vma
                  |          do_huge_pmd_anonymous_page
                  |          handle_mm_fault
                  |          do_page_fault
                  |          page_fault
                  |          |
                  |          |--87.39%-- skb_copy_datagram_iovec
                  |          |          tcp_recvmsg
                  |          |          inet_recvmsg
                  |          |          sock_recvmsg
                  |          |          sys_recvfrom
                  |          |          system_call
                  |          |          __recv
                  |          |          |
                  |          |           --100.00%-- (nil)
                  |          |
                  |           --12.61%-- memcpy
                   --2.70%-- [...]
      
      There was other data but primarily it is all showing that compaction is
      contended heavily on the zone->lock and zone->lru_lock.
      
      commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
      while isolating pages for migration] noted that it was possible for
      migration to hold the lru_lock for an excessive amount of time. Very
      broadly speaking this patch expands the concept.
      
      This patch introduces compact_checklock_irqsave() to check if a lock
      is contended or the process needs to be scheduled. If either condition
      is true then async compaction is aborted and the caller is informed.
      The page allocator will fail a THP allocation if compaction failed due
      to contention. This patch also introduces compact_trylock_irqsave()
      which will acquire the lock only if it is not contended and the process
      does not need to schedule.
      Reported-by: NJim Schutt <jaschut@sandia.gov>
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c67fe375