1. 15 3月, 2012 1 次提交
  2. 14 3月, 2012 5 次提交
  3. 11 3月, 2012 2 次提交
    • K
      xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. · 73c154c6
      Konrad Rzeszutek Wilk 提交于
      For the hypervisor to take advantage of the MWAIT support it needs
      to extract from the ACPI _CST the register address. But the
      hypervisor does not have the support to parse DSDT so it relies on
      the initial domain (dom0) to parse the ACPI Power Management information
      and push it up to the hypervisor. The pushing of the data is done
      by the processor_harveset_xen module which parses the information that
      the ACPI parser has graciously exposed in 'struct acpi_processor'.
      
      For the ACPI parser to also expose the Cx states for MWAIT, we need
      to expose the MWAIT capability (leaf 1). Furthermore we also need to
      expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
      function.
      
      The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
      operations, but it can't do it since it needs to be backwards compatible.
      Instead we choose to use the native CPUID to figure out if the MWAIT
      capability exists and use the XEN_SET_PDC query hypercall to figure out
      if the hypervisor wants us to expose the MWAIT_LEAF capability or not.
      
      Note: The XEN_SET_PDC query was implemented in c/s 23783:
      "ACPI: add _PDC input override mechanism".
      
      With this in place, instead of
       C3 ACPI IOPORT 415
      we get now
       C3:ACPI FFH INTEL MWAIT 0x20
      
      Note: The cpu_idle which would be calling the mwait variants for idling
      never gets set b/c we set the default pm_idle to be the hypercall variant.
      Acked-by: NJan Beulich <JBeulich@suse.com>
      [v2: Fix missing header file include and #ifdef]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73c154c6
    • K
      xen/setup/pm/acpi: Remove the call to boot_option_idle_override. · cc7335b2
      Konrad Rzeszutek Wilk 提交于
      We needed that call in the past to force the kernel to use
      default_idle (which called safe_halt, which called xen_safe_halt).
      
      But set_pm_idle_to_default() does now that, so there is no need
      to use this boot option operand.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cc7335b2
  4. 27 2月, 2012 1 次提交
  5. 25 1月, 2012 1 次提交
  6. 13 1月, 2012 3 次提交
  7. 12 1月, 2012 3 次提交
    • L
      Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci · 7b67e751
      Linus Torvalds 提交于
      * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (80 commits)
        x86/PCI: Expand the x86_msi_ops to have a restore MSIs.
        PCI: Increase resource array mask bit size in pcim_iomap_regions()
        PCI: DEVICE_COUNT_RESOURCE should be equal to PCI_NUM_RESOURCES
        PCI: pci_ids: add device ids for STA2X11 device (aka ConneXT)
        PNP: work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB
        x86/PCI: amd: factor out MMCONFIG discovery
        PCI: Enable ATS at the device state restore
        PCI: msi: fix imbalanced refcount of msi irq sysfs objects
        PCI: kconfig: English typo in pci/pcie/Kconfig
        PCI/PM/Runtime: make PCI traces quieter
        PCI: remove pci_create_bus()
        xtensa/PCI: convert to pci_scan_root_bus() for correct root bus resources
        x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus()
        x86/PCI: use pci_scan_bus() instead of pci_scan_bus_parented()
        x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan
        sparc32, leon/PCI: convert to pci_scan_root_bus() for correct root bus resources
        sparc/PCI: convert to pci_create_root_bus()
        sh/PCI: convert to pci_scan_root_bus() for correct root bus resources
        powerpc/PCI: convert to pci_create_root_bus()
        powerpc/PCI: split PHB part out of pcibios_map_io_space()
        ...
      
      Fix up conflicts in drivers/pci/msi.c and include/linux/pci_regs.h due
      to the same patches being applied in other branches.
      7b67e751
    • B
      cpu: Register a generic CPU device on architectures that currently do not · 9f13a1fd
      Ben Hutchings 提交于
      frv, h8300, m68k, microblaze, openrisc, score, um and xtensa currently
      do not register a CPU device.  Add the config option GENERIC_CPU_DEVICES
      which causes a generic CPU device to be registered for each present CPU,
      and make all these architectures select it.
      
      Richard Weinberger <richard@nod.at> covered UML and suggested using
      per_cpu.
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f13a1fd
    • B
      cpu: Do not return errors from cpu_dev_init() which will be ignored · 024f7846
      Ben Hutchings 提交于
      cpu_dev_init() is only called from driver_init(), which does not check
      its return value.  Therefore make cpu_dev_init() return void.
      
      We must register the CPU subsystem, so panic if this fails.
      
      If sched_create_sysfs_power_savings_entries() fails, the damage is
      contained, so ignore this (as before).
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      024f7846
  8. 11 1月, 2012 24 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 4f58cb90
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (54 commits)
        crypto: gf128mul - remove leftover "(EXPERIMENTAL)" in Kconfig
        crypto: serpent-sse2 - remove unneeded LRW/XTS #ifdefs
        crypto: serpent-sse2 - select LRW and XTS
        crypto: twofish-x86_64-3way - remove unneeded LRW/XTS #ifdefs
        crypto: twofish-x86_64-3way - select LRW and XTS
        crypto: xts - remove dependency on EXPERIMENTAL
        crypto: lrw - remove dependency on EXPERIMENTAL
        crypto: picoxcell - fix boolean and / or confusion
        crypto: caam - remove DECO access initialization code
        crypto: caam - fix polarity of "propagate error" logic
        crypto: caam - more desc.h cleanups
        crypto: caam - desc.h - convert spaces to tabs
        crypto: talitos - convert talitos_error to struct device
        crypto: talitos - remove NO_IRQ references
        crypto: talitos - fix bad kfree
        crypto: convert drivers/crypto/* to use module_platform_driver()
        char: hw_random: convert drivers/char/hw_random/* to use module_platform_driver()
        crypto: serpent-sse2 - should select CRYPTO_CRYPTD
        crypto: serpent - rename serpent.c to serpent_generic.c
        crypto: serpent - cleanup checkpatch errors and warnings
        ...
      4f58cb90
    • L
      Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security · e7691a1c
      Linus Torvalds 提交于
      * 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: (32 commits)
        ima: fix invalid memory reference
        ima: free duplicate measurement memory
        security: update security_file_mmap() docs
        selinux: Casting (void *) value returned by kmalloc is useless
        apparmor: fix module parameter handling
        Security: tomoyo: add .gitignore file
        tomoyo: add missing rcu_dereference()
        apparmor: add missing rcu_dereference()
        evm: prevent racing during tfm allocation
        evm: key must be set once during initialization
        mpi/mpi-mpow: NULL dereference on allocation failure
        digsig: build dependency fix
        KEYS: Give key types their own lockdep class for key->sem
        TPM: fix transmit_cmd error logic
        TPM: NSC and TIS drivers X86 dependency fix
        TPM: Export wait_for_stat for other vendor specific drivers
        TPM: Use vendor specific function for status probe
        tpm_tis: add delay after aborting command
        tpm_tis: Check return code from getting timeouts/durations
        tpm: Introduce function to poll for result of self test
        ...
      
      Fix up trivial conflict in lib/Makefile due to addition of CONFIG_MPI
      and SIGSIG next to CONFIG_DQL addition.
      e7691a1c
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5cd9599b
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        autofs4: deal with autofs4_write/autofs4_write races
        autofs4: catatonic_mode vs. notify_daemon race
        autofs4: autofs4_wait() vs. autofs4_catatonic_mode() race
        hfsplus: creation of hidden dir on mount can fail
        block_dev: Suppress bdev_cache_init() kmemleak warninig
        fix shrink_dcache_parent() livelock
        coda: switch coda_cnode_make() to sane API as well, clean coda_lookup()
        coda: deal correctly with allocation failure from coda_cnode_makectl()
        securityfs: fix object creation races
      5cd9599b
    • A
      autofs4: deal with autofs4_write/autofs4_write races · d668dc56
      Al Viro 提交于
      Just serialize the actual writing of packets into pipe on
      a new mutex, independent from everything else in the locking
      hierarchy.  As soon as something has started feeding a piece
      of packet into the pipe to daemon, we *want* everything else
      about to try the same to wait until we are done.
      Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d668dc56
    • A
      autofs4: catatonic_mode vs. notify_daemon race · 87533332
      Al Viro 提交于
      we need to hold ->wq_mutex while we are forming the packet to send,
      lest we have autofs4_catatonic_mode() setting wq->name.name to NULL
      just as autofs4_notify_daemon() decides to memcpy() from it...
      
      We do have check for catatonic mode immediately after that (under
      ->wq_mutex, as it ought to be) and packet won't be actually sent,
      but it'll be too late for us if we oops on that memcpy() from NULL...
      
      Fix is obvious - just extend the area covered by ->wq_mutex over
      that switch and check whether it's catatonic *before* doing anything
      else.
      Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      87533332
    • A
      autofs4: autofs4_wait() vs. autofs4_catatonic_mode() race · 4041bcdc
      Al Viro 提交于
      We need to recheck ->catatonic after autofs4_wait() got ->wq_mutex
      for good, or we might end up with wq inserted into queue after
      autofs4_catatonic_mode() had done its thing.  It will stick there
      forever, since there won't be anything to clear its ->name.name.
      
      A bit of a complication: validate_request() drops and regains ->wq_mutex.
      It actually ends up the most convenient place to stick the check into...
      Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4041bcdc
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · e343a895
      Linus Torvalds 提交于
      lib: use generic pci_iomap on all architectures
      
      Many architectures don't want to pull in iomap.c,
      so they ended up duplicating pci_iomap from that file.
      That function isn't trivial, and we are going to modify it
      https://lkml.org/lkml/2011/11/14/183
      so the duplication hurts.
      
      This reduces the scope of the problem significantly,
      by moving pci_iomap to a separate file and
      referencing that from all architectures.
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        alpha: drop pci_iomap/pci_iounmap from pci-noop.c
        mn10300: switch to GENERIC_PCI_IOMAP
        mn10300: add missing __iomap markers
        frv: switch to GENERIC_PCI_IOMAP
        tile: switch to GENERIC_PCI_IOMAP
        tile: don't panic on iomap
        sparc: switch to GENERIC_PCI_IOMAP
        sh: switch to GENERIC_PCI_IOMAP
        powerpc: switch to GENERIC_PCI_IOMAP
        parisc: switch to GENERIC_PCI_IOMAP
        mips: switch to GENERIC_PCI_IOMAP
        microblaze: switch to GENERIC_PCI_IOMAP
        arm: switch to GENERIC_PCI_IOMAP
        alpha: switch to GENERIC_PCI_IOMAP
        lib: add GENERIC_PCI_IOMAP
        lib: move GENERIC_IOMAP to lib/Kconfig
      
      Fix up trivial conflicts due to changes nearby in arch/{m68k,score}/Kconfig
      e343a895
    • L
      Merge tag 'for-linux-3.3-merge-window' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming · 06792c4d
      Linus Torvalds 提交于
      * tag 'for-linux-3.3-merge-window' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: (29 commits)
        C6X: replace tick_nohz_stop/restart_sched_tick calls
        C6X: add register_cpu call
        C6X: deal with memblock API changes
        C6X: fix timer64 initialization
        C6X: fix layout of EMIFA registers
        C6X: MAINTAINERS
        C6X: DSCR - Device State Configuration Registers
        C6X: EMIF - External Memory Interface
        C6X: general SoC support
        C6X: library code
        C6X: headers
        C6X: ptrace support
        C6X: loadable module support
        C6X: cache control
        C6X: clocks
        C6X: build infrastructure
        C6X: syscalls
        C6X: interrupt handling
        C6X: time management
        C6X: signal management
        ...
      06792c4d
    • L
      Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze · 4690dfa8
      Linus Torvalds 提交于
      * 'next' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Wire-up new system calls
        microblaze: Remove NO_IRQ from architecture
        input: xilinx_ps2: Don't use NO_IRQ
        block: xsysace: Don't use NO_IRQ
        microblaze: Trivial asm fix
        microblaze: Fix debug message in module
        microblaze: Remove eprintk macro
        microblaze: Send CR before LF for early console
        microblaze: Change NO_IRQ to 0
        microblaze: Use irq_of_parse_and_map for timer
        microblaze: intc: Change variable name
        microblaze: Use of_find_compatible_node for timer and intc
        microblaze: Add __cmpdi2
        microblaze: Synchronize __pa __va macros
      4690dfa8
    • L
      Merge branch 'unicore32' of git://github.com/gxt/linux · c2e08e7c
      Linus Torvalds 提交于
      * 'unicore32' of git://github.com/gxt/linux:
        rtc-puv3: solve section mismatch in rtc-puv3.c
        rtc-puv3: using module_platform_driver()
        i2c-puv3: using module_platform_driver()
        rtc-puv3: irq: remove IRQF_DISABLED
        unicore32: Remove IRQF_DISABLED
        unicore32: Use set_current_blocked()
        unicore32: add ioremap_nocache definition
        unicore32: delete specified xlate_dev_mem_ptr
        of: add include asm/setup.h in drivers/of/fdt.c
        unicore32: standardize /proc/iomem "Kernel code" name
      c2e08e7c
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin · 28190145
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin:
        blackfin: bf561: add adv7183 capture support
        blackfin: bf537: add capture support
        blackfin: bf548: add capture support
        blackfin: time-ts: rm unused func broadcast_timer_setup()
        blackfin: i2c-lcd: change default clock rate
        blackfin: mac: dsa: add vlan mask in board file
        blackfin: bf537: change num_chipselect for spi-sport
        blackfin: serial: bfin-uart: remove unused field
        bf54x: get mem size: missing break in switch
        blackfin: smp: fix msg queue overflow issue
        blackfin: config: update macro SPI_BFIN in board file
        blackfin: config: update def config for all boards
        blackfin: smp: cleanup smp code
        blackfin: smp: add suspend and wakeup irq flags
        blackfin: bf533-stamp: add missed patches for new asoc driver
        blackfin: bf533-stamp: fix ad1836 name
      28190145
    • L
      Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux · 001a541e
      Linus Torvalds 提交于
      * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
        writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
        writeback: balanced_rate cannot exceed write bandwidth
        writeback: do strict bdi dirty_exceeded
        writeback: avoid tiny dirty poll intervals
        writeback: max, min and target dirty pause time
        writeback: dirty ratelimit - think time compensation
        btrfs: fix dirtied pages accounting on sub-page writes
        writeback: fix dirtied pages accounting on redirty
        writeback: fix dirtied pages accounting on sub-page writes
        writeback: charge leaked page dirties to active tasks
        writeback: Include all dirty inodes in background writeback
      001a541e
    • L
      Merge branch 'akpm' (aka "Andrew's patch-bomb") · 40ba5879
      Linus Torvalds 提交于
      Andrew elucidates:
       - First installmeant of MM.  We have a HUGE number of MM patches this
         time.  It's crazy.
       - MAINTAINERS updates
       - backlight updates
       - leds
       - checkpatch updates
       - misc ELF stuff
       - rtc updates
       - reiserfs
       - procfs
       - some misc other bits
      
      * akpm: (124 commits)
        user namespace: make signal.c respect user namespaces
        workqueue: make alloc_workqueue() take printf fmt and args for name
        procfs: add hidepid= and gid= mount options
        procfs: parse mount options
        procfs: introduce the /proc/<pid>/map_files/ directory
        procfs: make proc_get_link to use dentry instead of inode
        signal: add block_sigmask() for adding sigmask to current->blocked
        sparc: make SA_NOMASK a synonym of SA_NODEFER
        reiserfs: don't lock root inode searching
        reiserfs: don't lock journal_init()
        reiserfs: delay reiserfs lock until journal initialization
        reiserfs: delete comments referring to the BKL
        drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
        drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
        drivers/rtc/: remove redundant spi driver bus initialization
        drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
        drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
        rtc: convert drivers/rtc/* to use module_platform_driver()
        drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
        drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
        ...
      40ba5879
    • S
      user namespace: make signal.c respect user namespaces · 6b550f94
      Serge E. Hallyn 提交于
      ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
      user namespace. (new, thanks Oleg)
      
      __send_signal: convert current's uid to the recipient's user namespace
      for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
      again :)
      
      do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
      user namespace
      
      ptrace_signal maps parent's uid into current's user namespace before
      including in signal to current.  IIUC Oleg has argued that this shouldn't
      matter as the debugger will play with it, but it seems like not converting
      the value currently being set is misleading.
      
      Changelog:
      Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
      	simplify callers and help make clear what we are translating
              (which uid into which namespace).  Passing the target task would
      	make callers even easier to read, but we pass in user_ns because
      	current_user_ns() != task_cred_xxx(current, user_ns).
      Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
      	in ptrace_signal().
      Sep 23: In send_signal(), detect when (user) signal is coming from an
      	ancestor or unrelated user namespace.  Pass that on to __send_signal,
      	which sets si_uid to 0 or overflowuid if needed.
      Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
      	SI_FROMKERNEL cases at callers, because we can't assume sender is
      	current in those cases.
      Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
      Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Serge Hallyn <serge.hallyn@canonical.com>
      Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
      
      Eric Biederman pointed out that passing info is a bug and could lead to a
      NULL pointer deref to boot.
      
      A collection of signal, securebits, filecaps, cap_bounds, and a few other
      ltp tests passed with this kernel.
      
      Changelog:
          Nov 18: previous patch missed a leading '&'
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Dan Carpenter <dan.carpenter@oracle.com>
      Subject: ipc/mqueue: lock() => unlock() typo
      
      There was a double lock typo introduced in b085f4bd6b21 "user namespace:
      make signal.c respect user namespaces"
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b550f94
    • T
      workqueue: make alloc_workqueue() take printf fmt and args for name · b196be89
      Tejun Heo 提交于
      alloc_workqueue() currently expects the passed in @name pointer to remain
      accessible.  This is inconvenient and a bit silly given that the whole wq
      is being dynamically allocated.  This patch updates alloc_workqueue() and
      friends to take printf format string instead of opaque string and matching
      varargs at the end.  The name is allocated together with the wq and
      formatted.
      
      alloc_ordered_workqueue() is converted to a macro to unify varargs
      handling with alloc_workqueue(), and, while at it, add comment to
      alloc_workqueue().
      
      None of the current in-kernel users pass in string with '%' as constant
      name and this change shouldn't cause any problem.
      
      [akpm@linux-foundation.org: use __printf]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b196be89
    • V
      procfs: add hidepid= and gid= mount options · 0499680a
      Vasiliy Kulikov 提交于
      Add support for mount options to restrict access to /proc/PID/
      directories.  The default backward-compatible "relaxed" behaviour is left
      untouched.
      
      The first mount option is called "hidepid" and its value defines how much
      info about processes we want to be available for non-owners:
      
      hidepid=0 (default) means the old behavior - anybody may read all
      world-readable /proc/PID/* files.
      
      hidepid=1 means users may not access any /proc/<pid>/ directories, but
      their own.  Sensitive files like cmdline, sched*, status are now protected
      against other users.  As permission checking done in proc_pid_permission()
      and files' permissions are left untouched, programs expecting specific
      files' modes are not confused.
      
      hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
      users.  It doesn't mean that it hides whether a process exists (it can be
      learned by other means, e.g.  by kill -0 $PID), but it hides process' euid
      and egid.  It compicates intruder's task of gathering info about running
      processes, whether some daemon runs with elevated privileges, whether
      another user runs some sensitive program, whether other users run any
      program at all, etc.
      
      gid=XXX defines a group that will be able to gather all processes' info
      (as in hidepid=0 mode).  This group should be used instead of putting
      nonroot user in sudoers file or something.  However, untrusted users (like
      daemons, etc.) which are not supposed to monitor the tasks in the whole
      system should not be added to the group.
      
      hidepid=1 or higher is designed to restrict access to procfs files, which
      might reveal some sensitive private information like precise keystrokes
      timings:
      
      http://www.openwall.com/lists/oss-security/2011/11/05/3
      
      hidepid=1/2 doesn't break monitoring userspace tools.  ps, top, pgrep, and
      conky gracefully handle EPERM/ENOENT and behave as if the current user is
      the only user running processes.  pstree shows the process subtree which
      contains "pstree" process.
      
      Note: the patch doesn't deal with setuid/setgid issues of keeping
      preopened descriptors of procfs files (like
      https://lkml.org/lkml/2011/2/7/368).  We rely on that the leaked
      information like the scheduling counters of setuid apps doesn't threaten
      anybody's privacy - only the user started the setuid program may read the
      counters.
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0499680a
    • V
      procfs: parse mount options · 97412950
      Vasiliy Kulikov 提交于
      Add support for procfs mount options.  Actual mount options are coming in
      the next patches.
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97412950
    • P
      procfs: introduce the /proc/<pid>/map_files/ directory · 640708a2
      Pavel Emelyanov 提交于
      This one behaves similarly to the /proc/<pid>/fd/ one - it contains
      symlinks one for each mapping with file, the name of a symlink is
      "vma->vm_start-vma->vm_end", the target is the file.  Opening a symlink
      results in a file that point exactly to the same inode as them vma's one.
      
      For example the ls -l of some arbitrary /proc/<pid>/map_files/
      
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so
      
      This *helps* checkpointing process in three ways:
      
      1. When dumping a task mappings we do know exact file that is mapped
         by particular region.  We do this by opening
         /proc/$pid/map_files/$address symlink the way we do with file
         descriptors.
      
      2. This also helps in determining which anonymous shared mappings are
         shared with each other by comparing the inodes of them.
      
      3. When restoring a set of processes in case two of them has a mapping
         shared, we map the memory by the 1st one and then open its
         /proc/$pid/map_files/$address file and map it by the 2nd task.
      
      Using /proc/$pid/maps for this is quite inconvenient since it brings
      repeatable re-reading and reparsing for this text file which slows down
      restore procedure significantly.  Also as being pointed in (3) it is a way
      easier to use top level shared mapping in children as
      /proc/$pid/map_files/$address when needed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE]
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Reviewed-by: NVasiliy Kulikov <segoon@openwall.com>
      Reviewed-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      640708a2
    • C
      procfs: make proc_get_link to use dentry instead of inode · 7773fbc5
      Cyrill Gorcunov 提交于
      Prepare the ground for the next "map_files" patch which needs a name of a
      link file to analyse.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7773fbc5
    • M
      signal: add block_sigmask() for adding sigmask to current->blocked · 5e6292c0
      Matt Fleming 提交于
      Abstract the code sequence for adding a signal handler's sa_mask to
      current->blocked because the sequence is identical for all architectures.
      Furthermore, in the past some architectures actually got this code wrong,
      so introduce a wrapper that all architectures can use.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e6292c0
    • M
      sparc: make SA_NOMASK a synonym of SA_NODEFER · f350b177
      Matt Fleming 提交于
      Unlike other architectures, sparc currently has no SA_NODEFER definition
      but only the older SA_NOMASK.  Since SA_NOMASK is the historical name for
      SA_NODEFER, add SA_NODEFER and copy what other architectures do by making
      SA_NOMASK a synonym for SA_NODEFER.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f350b177
    • F
      reiserfs: don't lock root inode searching · 9b467e6e
      Frederic Weisbecker 提交于
      Nothing requires that we lock the filesystem until the root inode is
      provided.
      
      Also iget5_locked() triggers a warning because we are holding the
      filesystem lock while allocating the inode, which result in a lockdep
      suspicion that we have a lock inversion against the reclaim path:
      
      [ 1986.896979] =================================
      [ 1986.896990] [ INFO: inconsistent lock state ]
      [ 1986.896997] 3.1.1-main #8
      [ 1986.897001] ---------------------------------
      [ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1986.897023]  (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
      [ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
      [ 1986.897050]   [<c014a5b9>] mark_held_locks+0xae/0xd0
      [ 1986.897060]   [<c014aab3>] lockdep_trace_alloc+0x7d/0x91
      [ 1986.897068]   [<c0190ee0>] kmem_cache_alloc+0x1a/0x93
      [ 1986.897078]   [<c01e7728>] reiserfs_alloc_inode+0x13/0x3d
      [ 1986.897088]   [<c01a5b06>] alloc_inode+0x14/0x5f
      [ 1986.897097]   [<c01a5cb9>] iget5_locked+0x62/0x13a
      [ 1986.897106]   [<c01e99e0>] reiserfs_fill_super+0x410/0x8b9
      [ 1986.897114]   [<c01953da>] mount_bdev+0x10b/0x159
      [ 1986.897123]   [<c01e764d>] get_super_block+0x10/0x12
      [ 1986.897131]   [<c0195b38>] mount_fs+0x59/0x12d
      [ 1986.897138]   [<c01a80d1>] vfs_kern_mount+0x45/0x7a
      [ 1986.897147]   [<c01a83e3>] do_kern_mount+0x2f/0xb0
      [ 1986.897155]   [<c01a987a>] do_mount+0x5c2/0x612
      [ 1986.897163]   [<c01a9a72>] sys_mount+0x61/0x8f
      [ 1986.897170]   [<c044060c>] sysenter_do_call+0x12/0x32
      [ 1986.897181] irq event stamp: 7509691
      [ 1986.897186] hardirqs last  enabled at (7509691): [<c0190f34>] kmem_cache_alloc+0x6e/0x93
      [ 1986.897197] hardirqs last disabled at (7509690): [<c0190eea>] kmem_cache_alloc+0x24/0x93
      [ 1986.897209] softirqs last  enabled at (7508896): [<c01294bd>] __do_softirq+0xee/0xfd
      [ 1986.897222] softirqs last disabled at (7508859): [<c01030ed>] do_softirq+0x50/0x9d
      [ 1986.897234]
      [ 1986.897235] other info that might help us debug this:
      [ 1986.897242]  Possible unsafe locking scenario:
      [ 1986.897244]
      [ 1986.897250]        CPU0
      [ 1986.897254]        ----
      [ 1986.897257]   lock(&REISERFS_SB(s)->lock);
      [ 1986.897265] <Interrupt>
      [ 1986.897269]     lock(&REISERFS_SB(s)->lock);
      [ 1986.897276]
      [ 1986.897277]  *** DEADLOCK ***
      [ 1986.897278]
      [ 1986.897286] no locks held by kswapd0/16.
      [ 1986.897291]
      [ 1986.897292] stack backtrace:
      [ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
      [ 1986.897306] Call Trace:
      [ 1986.897314]  [<c0439e76>] ? printk+0xf/0x11
      [ 1986.897324]  [<c01482d1>] print_usage_bug+0x20e/0x21a
      [ 1986.897332]  [<c01479b8>] ? print_irq_inversion_bug+0x172/0x172
      [ 1986.897341]  [<c014855c>] mark_lock+0x27f/0x483
      [ 1986.897349]  [<c0148d88>] __lock_acquire+0x628/0x1472
      [ 1986.897358]  [<c0149fae>] lock_acquire+0x47/0x5e
      [ 1986.897366]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
      [ 1986.897384]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
      [ 1986.897397]  [<c043b5ef>] mutex_lock_nested+0x35/0x26f
      [ 1986.897409]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
      [ 1986.897421]  [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
      [ 1986.897433]  [<c01e2edd>] map_block_for_writepage+0xc9/0x590
      [ 1986.897448]  [<c01b1706>] ? create_empty_buffers+0x33/0x8f
      [ 1986.897461]  [<c0121124>] ? get_parent_ip+0xb/0x31
      [ 1986.897472]  [<c043ef7f>] ? sub_preempt_count+0x81/0x8e
      [ 1986.897485]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
      [ 1986.897496]  [<c0121124>] ? get_parent_ip+0xb/0x31
      [ 1986.897508]  [<c01e355d>] reiserfs_writepage+0x1b9/0x3e7
      [ 1986.897521]  [<c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde
      [ 1986.897533]  [<c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138
      [ 1986.897546]  [<c014a71e>] ? trace_hardirqs_on+0xb/0xd
      [ 1986.897559]  [<c0177b38>] shrink_page_list+0x34f/0x5e2
      [ 1986.897572]  [<c01780a7>] shrink_inactive_list+0x172/0x22c
      [ 1986.897585]  [<c0178464>] shrink_zone+0x303/0x3b1
      [ 1986.897597]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
      [ 1986.897611]  [<c01788c9>] kswapd+0x3b7/0x5f2
      
      The deadlock shouldn't happen since we are doing that allocation in the
      mount path, the filesystem is not available for any reclaim.  Still the
      warning is annoying.
      
      To solve this, acquire the lock later only where we need it, right before
      calling reiserfs_read_locked_inode() that wants to lock to walk the tree.
      Reported-by: NKnut Petersen <Knut_Petersen@t-online.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b467e6e
    • F
      reiserfs: don't lock journal_init() · 37c69b98
      Frederic Weisbecker 提交于
      journal_init() doesn't need the lock since no operation on the filesystem
      is involved there.  journal_read() and get_list_bitmap() have yet to be
      reviewed carefully though before removing the lock there.  Just keep the
      it around these two calls for safety.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37c69b98
    • F
      reiserfs: delay reiserfs lock until journal initialization · f32485be
      Frederic Weisbecker 提交于
      In the mount path, transactions that are made before journal
      initialization don't involve the filesystem.  We can delay the reiserfs
      lock until we play with the journal.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f32485be