1. 08 10月, 2012 16 次提交
    • D
      X.509: Add a crypto key parser for binary (DER) X.509 certificates · c26fd69f
      David Howells 提交于
      Add a crypto key parser for binary (DER) encoded X.509 certificates.  The
      certificate is parsed and, if possible, the signature is verified.
      
      An X.509 key can be added like this:
      
      	# keyctl padd crypto bar @s </tmp/x509.cert
      	15768135
      
      and displayed like this:
      
      	# cat /proc/keys
      	00f09a47 I--Q---     1 perm 39390000     0     0 asymmetri bar: X509.RSA e9fd6d08 []
      
      Note that this only works with binary certificates.  PEM encoded certificates
      are ignored by the parser.
      
      Note also that the X.509 key ID is not congruent with the PGP key ID, but for
      the moment, they will match.
      
      If a NULL or "" name is given to add_key(), then the parser will generate a key
      description from the CertificateSerialNumber and Name fields of the
      TBSCertificate:
      
      	00aefc4e I--Q---     1 perm 39390000     0     0 asymmetri bfbc0cd76d050ea4:/C=GB/L=Cambridge/O=Red Hat/CN=kernel key: X509.RSA 0c688c7b []
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c26fd69f
    • D
      MPILIB: Provide a function to read raw data into an MPI · e1045992
      David Howells 提交于
      Provide a function to read raw data of a predetermined size into an MPI rather
      than expecting the size to be encoded within the data.  The data is assumed to
      represent an unsigned integer, and the resulting MPI will be positive.
      
      The function looks like this:
      
      	MPI mpi_read_raw_data(const void *, size_t);
      
      This is useful for reading ASN.1 integer primitives where the length is encoded
      in the ASN.1 metadata.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e1045992
    • D
      X.509: Add an ASN.1 decoder · 42d5ec27
      David Howells 提交于
      Add an ASN.1 BER/DER/CER decoder.  This uses the bytecode from the ASN.1
      compiler in the previous patch to inform it as to what to expect to find in the
      encoded byte stream.  The output from the compiler also tells it what functions
      to call on what tags, thus allowing the caller to retrieve information.
      
      The decoder is called as follows:
      
      	int asn1_decoder(const struct asn1_decoder *decoder,
      			 void *context,
      			 const unsigned char *data,
      			 size_t datalen);
      
      The decoder argument points to the bytecode from the ASN.1 compiler.  context
      is the caller's context and is passed to the action functions.  data and
      datalen define the byte stream to be decoded.
      
      Note that the decoder is currently limited to datalen being less than 64K.
      This reduces the amount of stack space used by the decoder because ASN.1 is a
      nested construct.  Similarly, the decoder is limited to a maximum of 10 levels
      of constructed data outside of a leaf node also in an effort to keep stack
      usage down.
      
      These restrictions can be raised if necessary.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      42d5ec27
    • D
      X.509: Add simple ASN.1 grammar compiler · 4520c6a4
      David Howells 提交于
      Add a simple ASN.1 grammar compiler.  This produces a bytecode output that can
      be fed to a decoder to inform the decoder how to interpret the ASN.1 stream it
      is trying to parse.
      
      Action functions can be specified in the grammar by interpolating:
      
      	({ foo })
      
      after a type, for example:
      
      	SubjectPublicKeyInfo ::= SEQUENCE {
      		algorithm		AlgorithmIdentifier,
      		subjectPublicKey	BIT STRING ({ do_key_data })
      		}
      
      The decoder is expected to call these after matching this type and parsing the
      contents if it is a constructed type.
      
      The grammar compiler does not currently support the SET type (though it does
      support SET OF) as I can't see a good way of tracking which members have been
      encountered yet without using up extra stack space.
      
      Currently, the grammar compiler will fail if more than 256 bytes of bytecode
      would be produced or more than 256 actions have been specified as it uses
      8-bit jump values and action indices to keep space usage down.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      4520c6a4
    • D
      X.509: Add utility functions to render OIDs as strings · 4f73175d
      David Howells 提交于
      Add a pair of utility functions to render OIDs as strings.  The first takes an
      encoded OID and turns it into a "a.b.c.d" form string:
      
      	int sprint_oid(const void *data, size_t datasize,
      		       char *buffer, size_t bufsize);
      
      The second takes an OID enum index and calls the first on the data held
      therein:
      
      	int sprint_OID(enum OID oid, char *buffer, size_t bufsize);
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      4f73175d
    • D
      X.509: Implement simple static OID registry · a77ad6ea
      David Howells 提交于
      Implement a simple static OID registry that allows the mapping of an encoded
      OID to an enum value for ease of use.
      
      The OID registry index enum appears in the:
      
      	linux/oid_registry.h
      
      header file.  A script generates the registry from lines in the header file
      that look like:
      
      	<sp*>OID_foo,<sp*>/*<sp*>1.2.3.4<sp*>*/
      
      The actual OID is taken to be represented by the numbers with interpolated
      dots in the comment.
      
      All other lines in the header are ignored.
      
      The registry is queries by calling:
      
      	OID look_up_oid(const void *data, size_t datasize);
      
      This returns a number from the registry enum representing the OID if found or
      OID__NR if not.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a77ad6ea
    • D
      RSA: Fix signature verification for shorter signatures · 0b1568a4
      David Howells 提交于
      gpg can produce a signature file where length of signature is less than the
      modulus size because the amount of space an MPI takes up is kept as low as
      possible by discarding leading zeros.  This regularly happens for several
      modules during the build.
      
      Fix it by relaxing check in RSA verification code.
      
      Thanks to Tomas Mraz and Miloslav Trmac for help.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0b1568a4
    • D
      RSA: Implement signature verification algorithm [PKCS#1 / RFC3447] · 612e0fe9
      David Howells 提交于
      Implement RSA public key cryptography [PKCS#1 / RFC3447].  At this time, only
      the signature verification algorithm is supported.  This uses the asymmetric
      public key subtype to hold its key data.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      612e0fe9
    • D
      MPILIB: Reinstate mpi_cmp[_ui]() and export for RSA signature verification · 12f008b6
      David Howells 提交于
      Reinstate and export mpi_cmp() and mpi_cmp_ui() from the MPI library for use by
      RSA signature verification as per RFC3447 section 5.2.2 step 1.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      12f008b6
    • D
      KEYS: Provide signature verification with an asymmetric key · 4ae71c1d
      David Howells 提交于
      Provide signature verification using an asymmetric-type key to indicate the
      public key to be used.
      
      The API is a single function that can be found in crypto/public_key.h:
      
      	int verify_signature(const struct key *key,
      			     const struct public_key_signature *sig)
      
      The first argument is the appropriate key to be used and the second argument
      is the parsed signature data:
      
      	struct public_key_signature {
      		u8 *digest;
      		u16 digest_size;
      		enum pkey_hash_algo pkey_hash_algo : 8;
      		union {
      			MPI mpi[2];
      			struct {
      				MPI s;		/* m^d mod n */
      			} rsa;
      			struct {
      				MPI r;
      				MPI s;
      			} dsa;
      		};
      	};
      
      This should be filled in prior to calling the function.  The hash algorithm
      should already have been called and the hash finalised and the output should
      be in a buffer pointed to by the 'digest' member.
      
      Any extra data to be added to the hash by the hash format (eg. PGP) should
      have been added by the caller prior to finalising the hash.
      
      It is assumed that the signature is made up of a number of MPI values.  If an
      algorithm becomes available for which this is not the case, the above structure
      will have to change.
      
      It is also assumed that it will have been checked that the signature algorithm
      matches the key algorithm.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      4ae71c1d
    • D
      KEYS: Asymmetric public-key algorithm crypto key subtype · a9681bf3
      David Howells 提交于
      Add a subtype for supporting asymmetric public-key encryption algorithms such
      as DSA (FIPS-186) and RSA (PKCS#1 / RFC1337).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a9681bf3
    • D
      KEYS: Asymmetric key pluggable data parsers · 46c6f177
      David Howells 提交于
      The instantiation data passed to the asymmetric key type are expected to be
      formatted in some way, and there are several possible standard ways to format
      the data.
      
      The two obvious standards are OpenPGP keys and X.509 certificates.  The latter
      is especially useful when dealing with UEFI, and the former might be useful
      when dealing with, say, eCryptfs.
      
      Further, it might be desirable to provide formatted blobs that indicate
      hardware is to be accessed to retrieve the keys or that the keys live
      unretrievably in a hardware store, but that the keys can be used by means of
      the hardware.
      
      From userspace, the keys can be loaded using the keyctl command, for example,
      an X.509 binary certificate:
      
      	keyctl padd asymmetric foo @s <dhowells.pem
      
      or a PGP key:
      
      	keyctl padd asymmetric bar @s <dhowells.pub
      
      or a pointer into the contents of the TPM:
      
      	keyctl add asymmetric zebra "TPM:04982390582905f8" @s
      
      Inside the kernel, pluggable parsers register themselves and then get to
      examine the payload data to see if they can handle it.  If they can, they get
      to:
      
        (1) Propose a name for the key, to be used it the name is "" or NULL.
      
        (2) Specify the key subtype.
      
        (3) Provide the data for the subtype.
      
      The key type asks the parser to do its stuff before a key is allocated and thus
      before the name is set.  If successful, the parser stores the suggested data
      into the key_preparsed_payload struct, which will be either used (if the key is
      successfully created and instantiated or updated) or discarded.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      46c6f177
    • D
      KEYS: Implement asymmetric key type · 964f3b3b
      David Howells 提交于
      Create a key type that can be used to represent an asymmetric key type for use
      in appropriate cryptographic operations, such as encryption, decryption,
      signature generation and signature verification.
      
      The key type is "asymmetric" and can provide access to a variety of
      cryptographic algorithms.
      
      Possibly, this would be better as "public_key" - but that has the disadvantage
      that "public key" is an overloaded term.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      964f3b3b
    • D
      KEYS: Document asymmetric key type · 9a83b465
      David Howells 提交于
      In-source documentation for the asymmetric key type.  This will be located in:
      
      	Documentation/crypto/asymmetric-keys.txt
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9a83b465
    • D
      MPILIB: Provide count_leading/trailing_zeros() based on arch functions · aacf29bf
      David Howells 提交于
      Provide count_leading/trailing_zeros() macros based on extant arch bit scanning
      functions rather than reimplementing from scratch in MPILIB.
      
      Whilst we're at it, turn count_foo_zeros(n, x) into n = count_foo_zeros(x).
      
      Also move the definition to asm-generic as other people may be interested in
      using it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      aacf29bf
    • D
      KEYS: Add payload preparsing opportunity prior to key instantiate or update · cf7f601c
      David Howells 提交于
      Give the key type the opportunity to preparse the payload prior to the
      instantiation and update routines being called.  This is done with the
      provision of two new key type operations:
      
      	int (*preparse)(struct key_preparsed_payload *prep);
      	void (*free_preparse)(struct key_preparsed_payload *prep);
      
      If the first operation is present, then it is called before key creation (in
      the add/update case) or before the key semaphore is taken (in the update and
      instantiate cases).  The second operation is called to clean up if the first
      was called.
      
      preparse() is given the opportunity to fill in the following structure:
      
      	struct key_preparsed_payload {
      		char		*description;
      		void		*type_data[2];
      		void		*payload;
      		const void	*data;
      		size_t		datalen;
      		size_t		quotalen;
      	};
      
      Before the preparser is called, the first three fields will have been cleared,
      the payload pointer and size will be stored in data and datalen and the default
      quota size from the key_type struct will be stored into quotalen.
      
      The preparser may parse the payload in any way it likes and may store data in
      the type_data[] and payload fields for use by the instantiate() and update()
      ops.
      
      The preparser may also propose a description for the key by attaching it as a
      string to the description field.  This can be used by passing a NULL or ""
      description to the add_key() system call or the key_create_or_update()
      function.  This cannot work with request_key() as that required the description
      to tell the upcall about the key to be created.
      
      This, for example permits keys that store PGP public keys to generate their own
      name from the user ID and public key fingerprint in the key.
      
      The instantiate() and update() operations are then modified to look like this:
      
      	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
      	int (*update)(struct key *key, struct key_preparsed_payload *prep);
      
      and the new payload data is passed in *prep, whether or not it was preparsed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      cf7f601c
  2. 28 9月, 2012 5 次提交
    • R
      module: wait when loading a module which is currently initializing. · 9bb9c3be
      Rusty Russell 提交于
      The original module-init-tools module loader used a fnctl lock on the
      .ko file to avoid attempts to simultaneously load a module.
      Unfortunately, you can't get an exclusive fcntl lock on a read-only
      fd, making this not work for read-only mounted filesystems.
      module-init-tools has a hacky sleep-and-loop for this now.
      
      It's not that hard to wait in the kernel, and only return -EEXIST once
      the first module has finished loading (or continue loading the module
      if the first one failed to initialize for some reason).  It's also
      consistent with what we do for dependent modules which are still loading.
      Suggested-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9bb9c3be
    • R
      module: fix symbol waiting when module fails before init · 6f13909f
      Rusty Russell 提交于
      We use resolve_symbol_wait(), which blocks if the module containing
      the symbol is still loading.  However:
      
      1) The module_wq we use is only woken after calling the modules' init
         function, but there are other failure paths after the module is
         placed in the linked list where we need to do the same thing.
      
      2) wake_up() only wakes one waiter, and our waitqueue is shared by all
         modules, so we need to wake them all.
      
      3) wake_up_all() doesn't imply a memory barrier: I feel happier calling
         it after we've grabbed and dropped the module_mutex, not just after
         the state assignment.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      6f13909f
    • D
      Make most arch asm/module.h files use asm-generic/module.h · 786d35d4
      David Howells 提交于
      Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela,
      ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version
      into asm-generic/module.h for all arches bar MIPS.
      
      Also, use the generic definition mod_arch_specific where possible.
      
      To this end, I've defined three new config bools:
      
       (*) HAVE_MOD_ARCH_SPECIFIC
      
           Arches define this if they don't want to use the empty generic
           mod_arch_specific struct.
      
       (*) MODULES_USE_ELF_RELA
      
           Arches define this if their modules can contain RELA records.  This causes
           the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be
           defined by the arch rather than have the core emit an error message.
      
       (*) MODULES_USE_ELF_REL
      
           Arches define this if their modules can contain REL records.  This causes
           the Elf_Rel mapping to be emitted and allows apply_relocate() to be
           defined by the arch rather than have the core emit an error message.
      
      Note that it is possible to allow both REL and RELA records: m68k and mips are
      two arches that do this.
      
      With this, some arch asm/module.h files can be deleted entirely and replaced
      with a generic-y marker in the arch Kbuild file.
      
      Additionally, I have removed the bits from m32r and score that handle the
      unsupported type of relocation record as that's now handled centrally.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      786d35d4
    • R
      MIPS: Fix module.c build for 32 bit · 6ede8123
      Ralf Baechle 提交于
      Fixes build failure introduced by "Make most arch asm/module.h files use
      asm-generic/module.h" by moving all the RELA processing code to a
      separate file to be used only for RELA processing on 64-bit kernels.
      
        CC      arch/mips/kernel/module.o
      arch/mips/kernel/module.c:250:14: error: 'reloc_handlers_rela' defined but not
      used [-Werror=unused-variable]
      cc1: all warnings being treated as errors
      
      make[6]: *** [arch/mips/kernel/module.o] Error 1
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      6ede8123
    • M
      module: taint kernel when lve module is loaded · c99af375
      Matthew Garrett 提交于
      Cloudlinux have a product called lve that includes a kernel module. This
      was previously GPLed but is now under a proprietary license, but the
      module continues to declare MODULE_LICENSE("GPL") and makes use of some
      EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.
      Signed-off-by: NMatthew Garrett <mjg59@srcf.ucam.org>
      Cc: Alex Lyashkov <umka@cloudlinux.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      c99af375
  3. 19 9月, 2012 2 次提交
  4. 18 9月, 2012 17 次提交
    • L
      Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 4651afbb
      Linus Torvalds 提交于
      Pull another workqueue fix from Tejun Heo:
       "Unfortunately, yet another late fix.  This too is discovered and fixed
        by Lai.  This bug was introduced during this merge window by commit
        25511a47 ("workqueue: reimplement CPU online rebinding to handle
        idle workers") which started using WORKER_REBIND flag for idle rebind
        too.
      
        The bug is relatively easy to trigger if the CPU rapidly goes through
        off, on and then off (and stay off).  The fix is on the safer side.
        This hasn't been on linux-next yet but I'm pushing early so that it
        can get more exposure before v3.6 release."
      
      * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
      4651afbb
    • L
      workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn() · 960bd11b
      Lai Jiangshan 提交于
      busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
      (CPU is down again).  This used to be okay because the flag wasn't
      used for anything else.
      
      However, after 25511a47 "workqueue: reimplement CPU online rebinding
      to handle idle workers", WORKER_REBIND is also used to command idle
      workers to rebind.  If not cleared, the worker may confuse the next
      CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
      prematurely calling idle_worker_rebind().
      
        WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
       00()
        Hardware name: Bochs
        Modules linked in: test_wq(O-)
        Pid: 33, comm: kworker/1:1 Tainted: G           O 3.6.0-rc1-work+ #3
        Call Trace:
         [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
         [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
         [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
         [<ffffffff810bc16e>] kthread+0xbe/0xd0
         [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
        ---[ end trace e977cf20f4661968 ]---
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
        PGD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in: test_wq(O-)
        CPU 0
        Pid: 33, comm: kworker/1:1 Tainted: G        W  O 3.6.0-rc1-work+ #3 Bochs Bochs
        RIP: 0010:[<ffffffff810b3db0>]  [<ffffffff810b3db0>] worker_thread+0x360/0x500
        RSP: 0018:ffff88001e1c9de0  EFLAGS: 00010086
        RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
        RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
        R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
        R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
        FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
        Stack:
         ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
         ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
         ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
        Call Trace:
         [<ffffffff810bc16e>] kthread+0xbe/0xd0
         [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
        Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
        RIP  [<ffffffff810b3db0>] worker_thread+0x360/0x500
         RSP <ffff88001e1c9de0>
        CR2: 0000000000000000
      
      There was no reason to keep WORKER_REBIND on failure in the first
      place - WORKER_UNBOUND is guaranteed to be set in such cases
      preventing incorrectly activating concurrency management.  Always
      clear WORKER_REBIND.
      
      tj: Updated comment and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      960bd11b
    • L
      Merge branch 'akpm' (Andrew's patch-bomb) · 08077ca8
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "13 patches.  12 are fixes and one is a little preparatory thing for
        Andi."
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (13 commits)
        memory hotplug: fix section info double registration bug
        mm/page_alloc: fix the page address of higher page's buddy calculation
        drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe
        compiler.h: add __visible
        pid-namespace: limit value of ns_last_pid to (0, max_pid)
        include/net/sock.h: squelch compiler warning in sk_rmem_schedule()
        slub: consider pfmemalloc_match() in get_partial_node()
        slab: fix starting index for finding another object
        slab: do ClearSlabPfmemalloc() for all pages of slab
        nbd: clear waiting_queue on shutdown
        MAINTAINERS: fix TXT maintainer list and source repo path
        mm/ia64: fix a memory block size bug
        memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails
      08077ca8
    • Q
      memory hotplug: fix section info double registration bug · f14851af
      qiuxishi 提交于
      There may be a bug when registering section info.  For example, on my
      Itanium platform, the pfn range of node0 includes the other nodes, so
      other nodes' section info will be double registered, and memmap's page
      count will equal to 3.
      
        node0: start_pfn=0x100,    spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
        node1: start_pfn=0x80000,  spanned_pfn=0x80000,  present_pfn=0x80000, => 0x080000-0x100000
        node2: start_pfn=0x100000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x100000-0x180000
        node3: start_pfn=0x180000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x180000-0x200000
      
        free_all_bootmem_node()
      	register_page_bootmem_info_node()
      		register_page_bootmem_info_section()
      
      When hot remove memory, we can't free the memmap's page because
      page_count() is 2 after put_page_bootmem().
      
        sparse_remove_one_section()
      	free_section_usemap()
      		free_map_bootmem()
      			put_page_bootmem()
      
      [akpm@linux-foundation.org: add code comment]
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f14851af
    • L
      mm/page_alloc: fix the page address of higher page's buddy calculation · 0ba8f2d5
      Li Haifeng 提交于
      The heuristic method for buddy has been introduced since commit
      43506fad ("mm/page_alloc.c: simplify calculation of combined index
      of adjacent buddy lists").  But the page address of higher page's buddy
      was wrongly calculated, which will lead page_is_buddy to fail for ever.
      IOW, the heuristic method would be disabled with the wrong page address
      of higher page's buddy.
      
      Calculating the page address of higher page's buddy should be based
      higher_page with the offset between index of higher page and index of
      higher page's buddy.
      Signed-off-by: NHaifeng Li <omycle@gmail.com>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: KyongHo Cho <pullip.cho@samsung.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: <stable@vger.kernel.org>	[2.6.38+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ba8f2d5
    • K
      drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe · 8dcebaa9
      Kevin Hilman 提交于
      On some platforms, bootloaders are known to do some interesting RTC
      programming.  Without going into the obscurities as to why this may be
      the case, suffice it to say the the driver should not make any
      assumptions about the state of the RTC when the driver loads.  In
      particular, the driver probe should be sure that all interrupts are
      disabled until otherwise programmed.
      
      This was discovered when finding bursty I2C traffic every second on
      Overo platforms.  This I2C overhead was keeping the SoC from hitting
      deep power states.  The cause was found to be the RTC firing every
      second on the I2C-connected TWL PMIC.
      
      Special thanks to Felipe Balbi for suggesting to look for a rogue driver
      as the source of the I2C traffic rather than the I2C driver itself.
      
      Special thanks to Steve Sakoman for helping track down the source of the
      continuous RTC interrups on the Overo boards.
      Signed-off-by: NKevin Hilman <khilman@ti.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Tested-by: NSteve Sakoman <steve@sakoman.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Tested-by: NShubhrajyoti Datta <omaplinuxkernel@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8dcebaa9
    • A
      compiler.h: add __visible · 9a858dc7
      Andi Kleen 提交于
      gcc 4.6+ has support for a externally_visible attribute that prevents the
      optimizer from optimizing unused symbols away.  Add a __visible macro to
      use it with that compiler version or later.
      
      This is used (at least) by the "Link Time Optimization" patchset.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a858dc7
    • A
      pid-namespace: limit value of ns_last_pid to (0, max_pid) · 579035dc
      Andrew Vagin 提交于
      The kernel doesn't check the pid for negative values, so if you try to
      write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic.
      
      The crash happens because the next pid is -1, and alloc_pidmap() will
      try to access to a nonexistent pidmap.
      
        map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      579035dc
    • C
      include/net/sock.h: squelch compiler warning in sk_rmem_schedule() · 35c448a8
      Chuck Lever 提交于
      This warning:
      
        In file included from linux/include/linux/tcp.h:227:0,
                         from linux/include/linux/ipv6.h:221,
                         from linux/include/net/ipv6.h:16,
                         from linux/include/linux/sunrpc/clnt.h:26,
                         from linux/net/sunrpc/stats.c:22:
        linux/include/net/sock.h: In function `sk_rmem_schedule':
        linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
      
      is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
      -Wextra option.
      
      Commit c76562b6 ("netvm: prevent a stream-specific deadlock")
      accidentally replaced the "size" parameter of sk_rmem_schedule() with an
      unsigned int.  This changes the semantics of the comparison in the
      return statement.
      
      In sk_wmem_schedule we have syntactically the same comparison, but
      "size" is a signed integer.  In addition, __sk_mem_schedule() takes a
      signed integer for its "size" parameter, so there is an implicit type
      conversion in sk_rmem_schedule() anyway.
      
      Revert the "size" parameter back to a signed integer so that the
      semantics of the expressions in both sk_[rw]mem_schedule() are exactly
      the same.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35c448a8
    • J
      slub: consider pfmemalloc_match() in get_partial_node() · 8ba00bb6
      Joonsoo Kim 提交于
      get_partial() is currently not checking pfmemalloc_match() meaning that
      it is possible for pfmemalloc pages to leak to non-pfmemalloc users.
      This is a problem in the following situation.  Assume that there is a
      request from normal allocation and there are no objects in the per-cpu
      cache and no node-partial slab.
      
      In this case, slab_alloc enters the slow path and new_slab_objects() is
      called which may return a PFMEMALLOC page.  As the current user is not
      allowed to access PFMEMALLOC page, deactivate_slab() is called
      ([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc
      checks]) and returns an object from PFMEMALLOC page.
      
      Next time, when we get another request from normal allocation,
      slab_alloc() enters the slow-path and calls new_slab_objects().  In
      new_slab_objects(), we call get_partial() and get a partial slab which
      was just deactivated but is a pfmemalloc page.  We extract one object
      from it and re-deactivate.
      
        "deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.
      
      As a result, access to PFMEMALLOC page is not properly restricted and it
      can cause a performance degradation due to frequent deactivation.
      deactivation frequently.
      
      This patch changes get_partial_node() to take pfmemalloc_match() into
      account and prevents the "deactivate -> re-get in get_partial()
      scenario.  Instead, new_slab() is called.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ba00bb6
    • J
      slab: fix starting index for finding another object · d014dc2e
      Joonsoo Kim 提交于
      In array cache, there is a object at index 0, check it.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d014dc2e
    • M
      slab: do ClearSlabPfmemalloc() for all pages of slab · 30c29bea
      Mel Gorman 提交于
      Right now, we call ClearSlabPfmemalloc() for first page of slab when we
      clear SlabPfmemalloc flag.  This is fine for most swap-over-network use
      cases as it is expected that order-0 pages are in use.  Unfortunately it
      is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail
      page and while this is harmless, it is sloppy.  This patch ensures that
      the head page is always used.
      
      This problem was originally identified by Joonsoo Kim.
      
      [js1304@gmail.com: Original implementation and problem identification]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30c29bea
    • P
      nbd: clear waiting_queue on shutdown · fded4e09
      Paul Clements 提交于
      Fix a serious but uncommon bug in nbd which occurs when there is heavy
      I/O going to the nbd device while, at the same time, a failure (server,
      network) or manual disconnect of the nbd connection occurs.
      
      There is a small window between the time that the nbd_thread is stopped
      and the socket is shutdown where requests can continue to be queued to
      nbd's internal waiting_queue.  When this happens, those requests are
      never completed or freed.
      
      The fix is to clear the waiting_queue on shutdown of the nbd device, in
      the same way that the nbd request queue (queue_head) is already being
      cleared.
      Signed-off-by: NPaul Clements <paul.clements@steeleye.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fded4e09
    • G
      MAINTAINERS: fix TXT maintainer list and source repo path · e9b7d7c8
      Gang Wei 提交于
      Signed-off-by: NGang Wei <gang.wei@intel.com>
      Cc: Richard L Maliszewski <richard.l.maliszewski@intel.com>
      Cc: Gang Wei <gang.wei@intel.com>
      Cc: Shane Wang <shane.wang@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9b7d7c8
    • J
      mm/ia64: fix a memory block size bug · 05cf9639
      Jianguo Wu 提交于
      I found following definition in include/linux/memory.h, in my IA64
      platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE
      will be 0.
      
        #define MIN_MEMORY_BLOCK_SIZE     (1 << SECTION_SIZE_BITS)
      
      Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits,
      so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0.
      Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong.
      This will cause wrong system memory infomation in sysfs.
      I think it should be:
      
        #define MIN_MEMORY_BLOCK_SIZE     (1UL << SECTION_SIZE_BITS)
      
      And "echo offline > memory0/state" will cause following call trace:
      
        kernel BUG at mm/memory_hotplug.c:885!
        sh[6455]: bugcheck! 0 [1]
        Pid: 6455, CPU 0, comm:                   sh
        psr : 0000101008526030 ifs : 8000000000000fa4 ip  : [<a0000001008c40f0>]    Not tainted (3.6.0-rc1)
        ip is at offline_pages+0x210/0xee0
        Call Trace:
          show_stack+0x80/0xa0
          show_regs+0x640/0x920
          die+0x190/0x2c0
          die_if_kernel+0x50/0x80
          ia64_bad_break+0x3d0/0x6e0
          ia64_native_leave_kernel+0x0/0x270
          offline_pages+0x210/0xee0
          alloc_pages_current+0x180/0x2a0
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05cf9639
    • W
      memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails · 18b48d58
      Wen Congyang 提交于
      If kthread_run() fails, pgdat->kswapd contains errno.  When we stop this
      thread, we only check whether pgdat->kswapd is NULL and access it.  If
      it contains errno, it will cause page fault.  Reset pgdat->kswapd to
      NULL when creating kernel thread fails can avoid this problem.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18b48d58
    • L
      Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · 2ade0b7f
      Linus Torvalds 提交于
      Pull InfiniBand/RDMA fixes from Roland Dreier:
       - A couple more IPoIB fixes for regressions introduced by path database
         conversion
       - Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma)
      
      * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
        IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum
        RDMA/ocrdma: Fix CQE expansion of unsignaled WQE
        mlx4_core: Fix integer overflows so 8TBs of memory registration works
        IPoIB: Fix AB-BA deadlock when deleting neighbours
        IPoIB: Fix memory leak in the neigh table deletion flow
        RDMA/cxgb4: Move dereference below NULL test
      2ade0b7f