1. 31 7月, 2012 4 次提交
  2. 30 7月, 2012 4 次提交
    • K
      fs: add link restriction audit reporting · a51d9eaa
      Kees Cook 提交于
      Adds audit messages for unexpected link restriction violations so that
      system owners will have some sort of potentially actionable information
      about misbehaving processes.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a51d9eaa
    • K
      fs: add link restrictions · 800179c9
      Kees Cook 提交于
      This adds symlink and hardlink restrictions to the Linux VFS.
      
      Symlinks:
      
      A long-standing class of security issues is the symlink-based
      time-of-check-time-of-use race, most commonly seen in world-writable
      directories like /tmp. The common method of exploitation of this flaw
      is to cross privilege boundaries when following a given symlink (i.e. a
      root process follows a symlink belonging to another user). For a likely
      incomplete list of hundreds of examples across the years, please see:
      http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
      
      The solution is to permit symlinks to only be followed when outside
      a sticky world-writable directory, or when the uid of the symlink and
      follower match, or when the directory owner matches the symlink's owner.
      
      Some pointers to the history of earlier discussion that I could find:
      
       1996 Aug, Zygo Blaxell
        http://marc.info/?l=bugtraq&m=87602167419830&w=2
       1996 Oct, Andrew Tridgell
        http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
       1997 Dec, Albert D Cahalan
        http://lkml.org/lkml/1997/12/16/4
       2005 Feb, Lorenzo Hernández García-Hierro
        http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
       2010 May, Kees Cook
        https://lkml.org/lkml/2010/5/30/144
      
      Past objections and rebuttals could be summarized as:
      
       - Violates POSIX.
         - POSIX didn't consider this situation and it's not useful to follow
           a broken specification at the cost of security.
       - Might break unknown applications that use this feature.
         - Applications that break because of the change are easy to spot and
           fix. Applications that are vulnerable to symlink ToCToU by not having
           the change aren't. Additionally, no applications have yet been found
           that rely on this behavior.
       - Applications should just use mkstemp() or O_CREATE|O_EXCL.
         - True, but applications are not perfect, and new software is written
           all the time that makes these mistakes; blocking this flaw at the
           kernel is a single solution to the entire class of vulnerability.
       - This should live in the core VFS.
         - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
       - This should live in an LSM.
         - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)
      
      Hardlinks:
      
      On systems that have user-writable directories on the same partition
      as system files, a long-standing class of security issues is the
      hardlink-based time-of-check-time-of-use race, most commonly seen in
      world-writable directories like /tmp. The common method of exploitation
      of this flaw is to cross privilege boundaries when following a given
      hardlink (i.e. a root process follows a hardlink created by another
      user). Additionally, an issue exists where users can "pin" a potentially
      vulnerable setuid/setgid file so that an administrator will not actually
      upgrade a system fully.
      
      The solution is to permit hardlinks to only be created when the user is
      already the existing file's owner, or if they already have read/write
      access to the existing file.
      
      Many Linux users are surprised when they learn they can link to files
      they have no access to, so this change appears to follow the doctrine
      of "least surprise". Additionally, this change does not violate POSIX,
      which states "the implementation may require that the calling process
      has permission to access the existing file"[1].
      
      This change is known to break some implementations of the "at" daemon,
      though the version used by Fedora and Ubuntu has been fixed[2] for
      a while. Otherwise, the change has been undisruptive while in use in
      Ubuntu for the last 1.5 years.
      
      [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
      [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279
      
      This patch is based on the patches in Openwall and grsecurity, along with
      suggestions from Al Viro. I have added a sysctl to enable the protected
      behavior, and documentation.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      800179c9
    • A
      consolidate pipe file creation · e4fad8e5
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e4fad8e5
    • A
      new helper: done_path_create() · 921a1650
      Al Viro 提交于
      releases what needs to be released after {kern,user}_path_create()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      921a1650
  3. 23 7月, 2012 9 次提交
  4. 14 7月, 2012 15 次提交
  5. 12 7月, 2012 5 次提交
    • Y
      memblock: free allocated memblock_reserved_regions later · 29f67386
      Yinghai Lu 提交于
      memblock_free_reserved_regions() calls memblock_free(), but
      memblock_free() would double reserved.regions too, so we could free the
      old range for reserved.regions.
      
      Also tj said there is another bug which could be related to this.
      
      | I don't think we're saving any noticeable
      | amount by doing this "free - give it to page allocator - reserve
      | again" dancing.  We should just allocate regions aligned to page
      | boundaries and free them later when memblock is no longer in use.
      
      in that case, when DEBUG_PAGEALLOC, will get panic:
      
           memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
        BUG: unable to handle kernel paging request at ffff88102febd948
        IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
        PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        CPU 0
        Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
        RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
      
      See the discussion at https://lkml.org/lkml/2012/6/13/469
      
      So try to allocate with PAGE_SIZE alignment and free it later.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29f67386
    • Y
      mm: sparse: fix usemap allocation above node descriptor section · 99ab7b19
      Yinghai Lu 提交于
      After commit f5bf18fa ("bootmem/sparsemem: remove limit constraint
      in alloc_bootmem_section"), usemap allocations may easily be placed
      outside the optimal section that holds the node descriptor, even if
      there is space available in that section.  This results in unnecessary
      hotplug dependencies that need to have the node unplugged before the
      section holding the usemap.
      
      The reason is that the bootmem allocator doesn't guarantee a linear
      search starting from the passed allocation goal but may start out at a
      much higher address absent an upper limit.
      
      Fix this by trying the allocation with the limit at the section end,
      then retry without if that fails.  This keeps the fix from f5bf18fa
      of not panicking if the allocation does not fit in the section, but
      still makes sure to try to stay within the section at first.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[3.3.x, 3.4.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99ab7b19
    • J
      memory hotplug: fix invalid memory access caused by stale kswapd pointer · d8adde17
      Jiang Liu 提交于
      kswapd_stop() is called to destroy the kswapd work thread when all memory
      of a NUMA node has been offlined.  But kswapd_stop() only terminates the
      work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
      pointer will prevent kswapd_run() from creating a new work thread when
      adding memory to the memory-less NUMA node again.  Eventually the stale
      pointer may cause invalid memory access.
      
      An example stack dump as below. It's reproduced with 2.6.32, but latest
      kernel has the same issue.
      
        BUG: unable to handle kernel NULL pointer dereference at (null)
        IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
        PGD 0
        Oops: 0000 [#1] SMP
        last sysfs file: /sys/devices/system/memory/memory391/state
        CPU 11
        Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
        Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
        RIP: 0010:exit_creds+0x12/0x78
        RSP: 0018:ffff8806044f1d78  EFLAGS: 00010202
        RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
        RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
        RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
        R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
        R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
        FS:  00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
        Stack:
         ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
         ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
         0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
        Call Trace:
          __put_task_struct+0x5d/0x97
          kthread_stop+0x50/0x58
          offline_pages+0x324/0x3da
          memory_block_change_state+0x179/0x1db
          store_mem_state+0x9e/0xbb
          sysfs_write_file+0xd0/0x107
          vfs_write+0xad/0x169
          sys_write+0x45/0x6e
          system_call_fastpath+0x16/0x1b
        Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
        RIP  exit_creds+0x12/0x78
         RSP <ffff8806044f1d78>
        CR2: 0000000000000000
      
      [akpm@linux-foundation.org: add pglist_data.kswapd locking comments]
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8adde17
    • T
      timekeeping: Provide hrtimer update function · f6c06abf
      Thomas Gleixner 提交于
      To finally fix the infamous leap second issue and other race windows
      caused by functions which change the offsets between the various time
      bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
      function which atomically gets the current monotonic time and updates
      the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
      overhead. The previous patch which provides ktime_t offsets allows us
      to make this function almost as cheap as ktime_get() which is going to
      be replaced in hrtimer_interrupt().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f6c06abf
    • J
      hrtimer: Provide clock_was_set_delayed() · f55a6faa
      John Stultz 提交于
      clock_was_set() cannot be called from hard interrupt context because
      it calls on_each_cpu().
      
      For fixing the widely reported leap seconds issue it is necessary to
      call it from hard interrupt context, i.e. the timer tick code, which
      does the timekeeping updates.
      
      Provide a new function which denotes it in the hrtimer cpu base
      structure of the cpu on which it is called and raise the hrtimer
      softirq. We then execute the clock_was_set() notificiation from
      softirq context in run_hrtimer_softirq(). The hrtimer softirq is
      rarely used, so polling the flag there is not a performance issue.
      
      [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
        rid of all this ifdeffery ASAP ]
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Reported-by: NJan Engelhardt <jengelh@inai.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f55a6faa
  6. 11 7月, 2012 1 次提交
    • A
      PCI: EHCI: fix crash during suspend on ASUS computers · dbf0e4c7
      Alan Stern 提交于
      Quite a few ASUS computers experience a nasty problem, related to the
      EHCI controllers, when going into system suspend.  It was observed
      that the problem didn't occur if the controllers were not put into the
      D3 power state before starting the suspend, and commit
      151b6128 (USB: EHCI: fix crash during
      suspend on ASUS computers) was created to do this.
      
      It turned out this approach messed up other computers that didn't have
      the problem -- it prevented USB wakeup from working.  Consequently
      commit c2fb8a3f (USB: add
      NO_D3_DURING_SLEEP flag and revert 151b6128) was merged; it
      reverted the earlier commit and added a whitelist of known good board
      names.
      
      Now we know the actual cause of the problem.  Thanks to AceLan Kao for
      tracking it down.
      
      According to him, an engineer at ASUS explained that some of their
      BIOSes contain a bug that was added in an attempt to work around a
      problem in early versions of Windows.  When the computer goes into S3
      suspend, the BIOS tries to verify that the EHCI controllers were first
      quiesced by the OS.  Nothing's wrong with this, but the BIOS does it
      by checking that the PCI COMMAND registers contain 0 without checking
      the controllers' power state.  If the register isn't 0, the BIOS
      assumes the controller needs to be quiesced and tries to do so.  This
      involves making various MMIO accesses to the controller, which don't
      work very well if the controller is already in D3.  The end result is
      a system hang or memory corruption.
      
      Since the value in the PCI COMMAND register doesn't matter once the
      controller has been suspended, and since the value will be restored
      anyway when the controller is resumed, we can work around the BIOS bug
      simply by setting the register to 0 during system suspend.  This patch
      (as1590) does so and also reverts the second commit mentioned above,
      which is now unnecessary.
      
      In theory we could do this for every PCI device.  However to avoid
      introducing new problems, the patch restricts itself to EHCI host
      controllers.
      
      Finally the affected systems can suspend with USB wakeup working
      properly.
      
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728Based-on-patch-by: NAceLan Kao <acelan.kao@canonical.com>
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Tested-by: NDâniel Fraga <fragabr@gmail.com>
      Tested-by: NJavier Marcet <jmarcet@gmail.com>
      Tested-by: NAndrey Rahmatullin <wrar@wrar.name>
      Tested-by: NOleksij Rempel <bug-track@fisher-privat.net>
      Tested-by: NPavel Pisa <pisa@cmp.felk.cvut.cz>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbf0e4c7
  7. 07 7月, 2012 1 次提交
  8. 05 7月, 2012 1 次提交