You need to sign in or sign up before continuing.
  1. 14 2月, 2020 2 次提交
  2. 09 2月, 2020 1 次提交
    • L
      pipe: use exclusive waits when reading or writing · 0ddad21d
      Linus Torvalds 提交于
      This makes the pipe code use separate wait-queues and exclusive waiting
      for readers and writers, avoiding a nasty thundering herd problem when
      there are lots of readers waiting for data on a pipe (or, less commonly,
      lots of writers waiting for a pipe to have space).
      
      While this isn't a common occurrence in the traditional "use a pipe as a
      data transport" case, where you typically only have a single reader and
      a single writer process, there is one common special case: using a pipe
      as a source of "locking tokens" rather than for data communication.
      
      In particular, the GNU make jobserver code ends up using a pipe as a way
      to limit parallelism, where each job consumes a token by reading a byte
      from the jobserver pipe, and releases the token by writing a byte back
      to the pipe.
      
      This pattern is fairly traditional on Unix, and works very well, but
      will waste a lot of time waking up a lot of processes when only a single
      reader needs to be woken up when a writer releases a new token.
      
      A simplified test-case of just this pipe interaction is to create 64
      processes, and then pass a single token around between them (this
      test-case also intentionally passes another token that gets ignored to
      test the "wake up next" logic too, in case anybody wonders about it):
      
          #include <unistd.h>
      
          int main(int argc, char **argv)
          {
              int fd[2], counters[2];
      
              pipe(fd);
              counters[0] = 0;
              counters[1] = -1;
              write(fd[1], counters, sizeof(counters));
      
              /* 64 processes */
              fork(); fork(); fork(); fork(); fork(); fork();
      
              do {
                      int i;
                      read(fd[0], &i, sizeof(i));
                      if (i < 0)
                              continue;
                      counters[0] = i+1;
                      write(fd[1], counters, (1+(i & 1)) *sizeof(int));
              } while (counters[0] < 1000000);
              return 0;
          }
      
      and in a perfect world, passing that token around should only cause one
      context switch per transfer, when the writer of a token causes a
      directed wakeup of just a single reader.
      
      But with the "writer wakes all readers" model we traditionally had, on
      my test box the above case causes more than an order of magnitude more
      scheduling: instead of the expected ~1M context switches, "perf stat"
      shows
      
              231,852.37 msec task-clock                #   15.857 CPUs utilized
              11,250,961      context-switches          #    0.049 M/sec
                 616,304      cpu-migrations            #    0.003 M/sec
                   1,648      page-faults               #    0.007 K/sec
       1,097,903,998,514      cycles                    #    4.735 GHz
         120,781,778,352      instructions              #    0.11  insn per cycle
          27,997,056,043      branches                  #  120.754 M/sec
             283,581,233      branch-misses             #    1.01% of all branches
      
            14.621273891 seconds time elapsed
      
             0.018243000 seconds user
             3.611468000 seconds sys
      
      before this commit.
      
      After this commit, I get
      
                5,229.55 msec task-clock                #    3.072 CPUs utilized
               1,212,233      context-switches          #    0.232 M/sec
                 103,951      cpu-migrations            #    0.020 M/sec
                   1,328      page-faults               #    0.254 K/sec
          21,307,456,166      cycles                    #    4.074 GHz
          12,947,819,999      instructions              #    0.61  insn per cycle
           2,881,985,678      branches                  #  551.096 M/sec
              64,267,015      branch-misses             #    2.23% of all branches
      
             1.702148350 seconds time elapsed
      
             0.004868000 seconds user
             0.110786000 seconds sys
      
      instead. Much better.
      
      [ Note! This kernel improvement seems to be very good at triggering a
        race condition in the make jobserver (in GNU make 4.2.1) for me. It's
        a long known bug that was fixed back in June 2017 by GNU make commit
        b552b0525198 ("[SV 51159] Use a non-blocking read with pselect to
        avoid hangs.").
      
        But there wasn't a new release of GNU make until 4.3 on Jan 19 2020,
        so a number of distributions may still have the buggy version. Some
        have backported the fix to their 4.2.1 release, though, and even
        without the fix it's quite timing-dependent whether the bug actually
        is hit. ]
      
      Josh Triplett says:
       "I've been hammering on your pipe fix patch (switching to exclusive
        wait queues) for a month or so, on several different systems, and I've
        run into no issues with it. The patch *substantially* improves
        parallel build times on large (~100 CPU) systems, both with parallel
        make and with other things that use make's pipe-based jobserver.
      
        All current distributions (including stable and long-term stable
        distributions) have versions of GNU make that no longer have the
        jobserver bug"
      Tested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ddad21d
  3. 08 2月, 2020 12 次提交
  4. 07 2月, 2020 5 次提交
  5. 06 2月, 2020 2 次提交
    • Q
      skbuff: fix a data race in skb_queue_len() · 86b18aaa
      Qian Cai 提交于
      sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
      
       read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
        unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
      				 net/unix/af_unix.c:1761
        ____sys_sendmsg+0x33e/0x370
        ___sys_sendmsg+0xa6/0xf0
        __sys_sendmsg+0x69/0xf0
        __x64_sys_sendmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
        __skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
        __skb_try_recv_datagram+0xbe/0x220
        unix_dgram_recvmsg+0xee/0x850
        ____sys_recvmsg+0x1fb/0x210
        ___sys_recvmsg+0xa2/0xf0
        __sys_recvmsg+0x66/0xf0
        __x64_sys_recvmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Since only the read is operating as lockless, it could introduce a logic
      bug in unix_recvq_full() due to the load tearing. Fix it by adding
      a lockless variant of skb_queue_len() and unix_recvq_full() where
      READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
      the commit d7d16a89 ("net: add skb_queue_empty_lockless()").
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86b18aaa
    • G
      of: clk: Make <linux/of_clk.h> self-contained · 5df86714
      Geert Uytterhoeven 提交于
      Depending on include order:
      
          include/linux/of_clk.h:11:45: warning: ‘struct device_node’ declared inside parameter list will not be visible outside of this definition or declaration
           unsigned int of_clk_get_parent_count(struct device_node *np);
      						 ^~~~~~~~~~~
          include/linux/of_clk.h:12:43: warning: ‘struct device_node’ declared inside parameter list will not be visible outside of this definition or declaration
           const char *of_clk_get_parent_name(struct device_node *np, int index);
      					       ^~~~~~~~~~~
          include/linux/of_clk.h:13:31: warning: ‘struct of_device_id’ declared inside parameter list will not be visible outside of this definition or declaration
           void of_clk_init(const struct of_device_id *matches);
      				   ^~~~~~~~~~~~
      
      Fix this by adding forward declarations for struct device_node and
      struct of_device_id.
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Link: https://lkml.kernel.org/r/20200205194649.31309-1-geert+renesas@glider.beSigned-off-by: NStephen Boyd <sboyd@kernel.org>
      5df86714
  6. 05 2月, 2020 3 次提交
    • E
      bonding/alb: properly access headers in bond_alb_xmit() · 38f88c45
      Eric Dumazet 提交于
      syzbot managed to send an IPX packet through bond_alb_xmit()
      and af_packet and triggered a use-after-free.
      
      First, bond_alb_xmit() was using ipx_hdr() helper to reach
      the IPX header, but ipx_hdr() was using the transport offset
      instead of the network offset. In the particular syzbot
      report transport offset was 0xFFFF
      
      This patch removes ipx_hdr() since it was only (mis)used from bonding.
      
      Then we need to make sure IPv4/IPv6/IPX headers are pulled
      in skb->head before dereferencing anything.
      
      BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
      Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108
       (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...)
      
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53
       [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282
       [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline]
       [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline]
       [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422
       [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469
       [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
       [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline]
       [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224
       [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline]
       [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline]
       [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline]
       [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627
       [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238
       [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278
       [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline]
       [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252
       [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684
       [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996
       [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline]
       [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38f88c45
    • A
      net: dsa: microchip: Platform data shan't include kernel.h · 8b7a07c7
      Andy Shevchenko 提交于
      Replace with appropriate types.h.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b7a07c7
    • A
      net: dsa: b53: Platform data shan't include kernel.h · e22e0790
      Andy Shevchenko 提交于
      Replace with appropriate types.h.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e22e0790
  7. 04 2月, 2020 15 次提交