1. 24 3月, 2015 1 次提交
  2. 21 3月, 2015 5 次提交
  3. 19 3月, 2015 10 次提交
  4. 18 3月, 2015 2 次提交
  5. 17 3月, 2015 1 次提交
    • P
      livepatch: Fix subtle race with coming and going modules · 8cb2c2dc
      Petr Mladek 提交于
      There is a notifier that handles live patches for coming and going modules.
      It takes klp_mutex lock to avoid races with coming and going patches but
      it does not keep the lock all the time. Therefore the following races are
      possible:
      
        1. The notifier is called sometime in STATE_MODULE_COMING. The module
           is visible by find_module() in this state all the time. It means that
           new patch can be registered and enabled even before the notifier is
           called. It might create wrong order of stacked patches, see below
           for an example.
      
         2. New patch could still see the module in the GOING state even after
            the notifier has been called. It will try to initialize the related
            object structures but the module could disappear at any time. There
            will stay mess in the structures. It might even cause an invalid
            memory access.
      
      This patch solves the problem by adding a boolean variable into struct module.
      The value is true after the coming and before the going handler is called.
      New patches need to be applied when the value is true and they need to ignore
      the module when the value is false.
      
      Note that we need to know state of all modules on the system. The races are
      related to new patches. Therefore we do not know what modules will get
      patched.
      
      Also note that we could not simply ignore going modules. The code from the
      module could be called even in the GOING state until mod->exit() finishes.
      If we start supporting patches with semantic changes between function
      calls, we need to apply new patches to any still usable code.
      See below for an example.
      
      Finally note that the patch solves only the situation when a new patch is
      registered. There are no such problems when the patch is being removed.
      It does not matter who disable the patch first, whether the normal
      disable_patch() or the module notifier. There is nothing to do
      once the patch is disabled.
      
      Alternative solutions:
      ======================
      
      + reject new patches when a patched module is coming or going; this is ugly
      
      + wait with adding new patch until the module leaves the COMING and GOING
        states; this might be dangerous and complicated; we would need to release
        kgr_lock in the middle of the patch registration to avoid a deadlock
        with the coming and going handlers; also we might need a waitqueue for
        each module which seems to be even bigger overhead than the boolean
      
      + stop modules from entering COMING and GOING states; wait until modules
        leave these states when they are already there; looks complicated; we would
        need to ignore the module that asked to stop the others to avoid a deadlock;
        also it is unclear what to do when two modules asked to stop others and
        both are in COMING state (situation when two new patches are applied)
      
      + always register/enable new patches and fix up the potential mess (registered
        patches order) in klp_module_init(); this is nasty and prone to regressions
        in the future development
      
      + add another MODULE_STATE where the kallsyms are visible but the module is not
        used yet; this looks too complex; the module states are checked on "many"
        locations
      
      Example of patch stacking breakage:
      ===================================
      
      The notifier could _not_ _simply_ ignore already initialized module objects.
      For example, let's have three patches (P1, P2, P3) for functions a() and b()
      where a() is from vmcore and b() is from a module M. Something like:
      
      	a()	b()
      P1	a1()	b1()
      P2	a2()	b2()
      P3	a3()	b3(3)
      
      If you load the module M after all patches are registered and enabled.
      The ftrace ops for function a() and b() has listed the functions in this
      order:
      
      	ops_a->func_stack -> list(a3,a2,a1)
      	ops_b->func_stack -> list(b3,b2,b1)
      
      , so the pointer to b3() is the first and will be used.
      
      Then you might have the following scenario. Let's start with state when patches
      P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
      ops for b() does not exist. Then we get into the following race:
      
      CPU0					CPU1
      
      load_module(M)
      
        complete_formation()
      
        mod->state = MODULE_STATE_COMING;
        mutex_unlock(&module_mutex);
      
      					klp_register_patch(P3);
      					klp_enable_patch(P3);
      
      					# STATE 1
      
        klp_module_notify(M)
          klp_module_notify_coming(P1);
          klp_module_notify_coming(P2);
          klp_module_notify_coming(P3);
      
      					# STATE 2
      
      The ftrace ops for a() and b() then looks:
      
        STATE1:
      
      	ops_a->func_stack -> list(a3,a2,a1);
      	ops_b->func_stack -> list(b3);
      
        STATE2:
      	ops_a->func_stack -> list(a3,a2,a1);
      	ops_b->func_stack -> list(b2,b1,b3);
      
      therefore, b2() is used for the module but a3() is used for vmcore
      because they were the last added.
      
      Example of the race with going modules:
      =======================================
      
      CPU0					CPU1
      
      delete_module()  #SYSCALL
      
         try_stop_module()
           mod->state = MODULE_STATE_GOING;
      
         mutex_unlock(&module_mutex);
      
      					klp_register_patch()
      					klp_enable_patch()
      
      					#save place to switch universe
      
      					b()     # from module that is going
      					  a()   # from core (patched)
      
         mod->exit();
      
      Note that the function b() can be called until we call mod->exit().
      
      If we do not apply patch against b() because it is in MODULE_STATE_GOING,
      it will call patched a() with modified semantic and things might get wrong.
      
      [jpoimboe@redhat.com: use one boolean instead of two]
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      8cb2c2dc
  6. 16 3月, 2015 5 次提交
  7. 15 3月, 2015 5 次提交
  8. 13 3月, 2015 7 次提交
    • G
      of/platform: Fix sparc:allmodconfig build · a697c2ef
      Guenter Roeck 提交于
      sparc:allmodconfig fails to build with:
      
      drivers/built-in.o: In function `platform_bus_init':
      (.init.text+0x3684): undefined reference to `of_platform_register_reconfig_notifier'
      
      of_platform_register_reconfig_notifier is only declared if both OF_ADDRESS
      and OF_DYNAMIC are configured. Yet, the include file only declares a dummy
      function if OF_DYNAMIC is not configured. The sparc architecture does not
      configure OF_ADDRESS, but does configure OF_DYNAMIC, causing above error.
      
      Fixes: 801d728c ("of/reconfig: Add OF_DYNAMIC notifier for platform_bus_type")
      Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NRob Herring <robh@kernel.org>
      a697c2ef
    • D
      rhashtable: kill ht->shift atomic operations · a5b6846f
      Daniel Borkmann 提交于
      Commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker
      queue") changed ht->shift to be atomic, which is actually unnecessary.
      
      Instead of leaving the current shift in the core rhashtable structure,
      it can be cached inside the individual bucket tables.
      
      There, it will only be initialized once during a new table allocation
      in the shrink/expansion slow path, and from then onward it stays immutable
      for the rest of the bucket table liftime.
      
      That allows shift to be non-atomic. The patch also moves hash_rnd
      management into the table setup. The rhashtable structure now consumes
      3 instead of 4 cachelines.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Ying Xue <ying.xue@windriver.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b6846f
    • A
      kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h> · d3733e5c
      Andrey Ryabinin 提交于
      include/linux/moduleloader.h is more suitable place for this macro.
      Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such
      alignment already assumed in several places.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3733e5c
    • A
      kasan, module, vmalloc: rework shadow allocation for modules · a5af5aa8
      Andrey Ryabinin 提交于
      Current approach in handling shadow memory for modules is broken.
      
      Shadow memory could be freed only after memory shadow corresponds it is no
      longer used.  vfree() called from interrupt context could use memory its
      freeing to store 'struct llist_node' in it:
      
          void vfree(const void *addr)
          {
          ...
              if (unlikely(in_interrupt())) {
                  struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
                  if (llist_add((struct llist_node *)addr, &p->list))
                          schedule_work(&p->wq);
      
      Later this list node used in free_work() which actually frees memory.
      Currently module_memfree() called in interrupt context will free shadow
      before freeing module's memory which could provoke kernel crash.
      
      So shadow memory should be freed after module's memory.  However, such
      deallocation order could race with kasan_module_alloc() in module_alloc().
      
      Free shadow right before releasing vm area.  At this point vfree()'d
      memory is not used anymore and yet not available for other allocations.
      New VM_KASAN flag used to indicate that vm area has dynamically allocated
      shadow memory so kasan frees shadow only if it was previously allocated.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5af5aa8
    • D
      ebpf: verifier: check that call reg with ARG_ANYTHING is initialized · 80f1d68c
      Daniel Borkmann 提交于
      I noticed that a helper function with argument type ARG_ANYTHING does
      not need to have an initialized value (register).
      
      This can worst case lead to unintented stack memory leakage in future
      helper functions if they are not carefully designed, or unintended
      application behaviour in case the application developer was not careful
      enough to match a correct helper function signature in the API.
      
      The underlying issue is that ARG_ANYTHING should actually be split
      into two different semantics:
      
        1) ARG_DONTCARE for function arguments that the helper function
           does not care about (in other words: the default for unused
           function arguments), and
      
        2) ARG_ANYTHING that is an argument actually being used by a
           helper function and *guaranteed* to be an initialized register.
      
      The current risk is low: ARG_ANYTHING is only used for the 'flags'
      argument (r4) in bpf_map_update_elem() that internally does strict
      checking.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80f1d68c
    • E
      net: Introduce possible_net_t · 0c5c9fb5
      Eric W. Biederman 提交于
      Having to say
      > #ifdef CONFIG_NET_NS
      > 	struct net *net;
      > #endif
      
      in structures is a little bit wordy and a little bit error prone.
      
      Instead it is possible to say:
      > typedef struct {
      > #ifdef CONFIG_NET_NS
      >       struct net *net;
      > #endif
      > } possible_net_t;
      
      And then in a header say:
      
      > 	possible_net_t net;
      
      Which is cleaner and easier to use and easier to test, as the
      possible_net_t is always there no matter what the compile options.
      
      Further this allows read_pnet and write_pnet to be functions in all
      cases which is better at catching typos.
      
      This change adds possible_net_t, updates the definitions of read_pnet
      and write_pnet, updates optional struct net * variables that
      write_pnet uses on to have the type possible_net_t, and finally fixes
      up the b0rked users of read_pnet and write_pnet.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5c9fb5
    • E
      net: Kill hold_net release_net · efd7ef1c
      Eric W. Biederman 提交于
      hold_net and release_net were an idea that turned out to be useless.
      The code has been disabled since 2008.  Kill the code it is long past due.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efd7ef1c
  9. 12 3月, 2015 4 次提交
    • E
      xps: must clear sender_cpu before forwarding · c29390c6
      Eric Dumazet 提交于
      John reported that my previous commit added a regression
      on his router.
      
      This is because sender_cpu & napi_id share a common location,
      so get_xps_queue() can see garbage and perform an out of bound access.
      
      We need to make sure sender_cpu is cleared before doing the transmit,
      otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
      this bug.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJohn <jw@nuclearfallout.net>
      Bisected-by: NJohn <jw@nuclearfallout.net>
      Fixes: 2bd82484 ("xps: fix xps for stacked devices")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c29390c6
    • E
      net: add real socket cookies · 33cf7c90
      Eric Dumazet 提交于
      A long standing problem in netlink socket dumps is the use
      of kernel socket addresses as cookies.
      
      1) It is a security concern.
      
      2) Sockets can be reused quite quickly, so there is
         no guarantee a cookie is used once and identify
         a flow.
      
      3) request sock, establish sock, and timewait socks
         for a given flow have different cookies.
      
      Part of our effort to bring better TCP statistics requires
      to switch to a different allocator.
      
      In this patch, I chose to use a per network namespace 64bit generator,
      and to use it only in the case a socket needs to be dumped to netlink.
      (This might be refined later if needed)
      
      Note that I tried to carry cookies from request sock, to establish sock,
      then timewait sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric Salo <salo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cf7c90
    • M
      clk: introduce clk_is_match · 3d3801ef
      Michael Turquette 提交于
      Some drivers compare struct clk pointers as a means of knowing
      if the two pointers reference the same clock hardware. This behavior is
      dubious (drivers must not dereference struct clk), but did not cause any
      regressions until the per-user struct clk patch was merged. Now the test
      for matching clk's will always fail with per-user struct clk's.
      
      clk_is_match is introduced to fix the regression and prevent drivers
      from comparing the pointers manually.
      
      Fixes: 035a61c3 ("clk: Make clk API return per-user struct clk instances")
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Shawn Guo <shawn.guo@linaro.org>
      Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
      Signed-off-by: NMichael Turquette <mturquette@linaro.org>
      [arnd@arndb.de: Fix COMMON_CLK=N && HAS_CLK=Y config]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      [sboyd@codeaurora.org: const arguments to clk_is_match() and
      remove unnecessary ternary operation]
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      3d3801ef
    • F
      of: mdio: export of_mdio_parse_addr · 33d67377
      Florian Fainelli 提交于
      Export of_mdio_parse_addr() which allows parsing a given Ethernet PHY
      node MDIO address, verify it is within the allowed range, and return
      its value. This is going to be useful for the DSA code which needs to
      deal with multiple layers of MDIO buses.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33d67377