diff --git a/doc/TODO.detail/performance b/doc/TODO.detail/performance index d77e123b1390d6eabd5b2318d9416d72c182741d..62b302014e8b56a780eb8deccba31de8e26714c9 100644 --- a/doc/TODO.detail/performance +++ b/doc/TODO.detail/performance @@ -345,7 +345,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087 for ; Tue, 19 Oct 1999 10:31:08 -0400 (EDT) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14 $) with ESMTP id KAA27535 for ; Tue, 19 Oct 1999 10:19:47 -0400 (EDT) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15 $) with ESMTP id KAA27535 for ; Tue, 19 Oct 1999 10:19:47 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id KAA30328; Tue, 19 Oct 1999 10:12:10 -0400 (EDT) @@ -454,7 +454,7 @@ From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130 for ; Tue, 19 Oct 1999 21:25:26 -0400 (EDT) -Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14 $) with ESMTP id VAA10512 for ; Tue, 19 Oct 1999 21:15:28 -0400 (EDT) +Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15 $) with ESMTP id VAA10512 for ; Tue, 19 Oct 1999 21:15:28 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.9.3/8.9.3) with SMTP id VAA50745; Tue, 19 Oct 1999 21:07:23 -0400 (EDT) @@ -1006,7 +1006,7 @@ From pgsql-general-owner+M2497@hub.org Fri Jun 16 18:31:03 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04165 for ; Fri, 16 Jun 2000 17:31:01 -0400 (EDT) -Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.14 $) with ESMTP id RAA13110 for ; Fri, 16 Jun 2000 17:20:12 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15 $) with ESMTP id RAA13110 for ; Fri, 16 Jun 2000 17:20:12 -0400 (EDT) Received: from hub.org (majordom@localhost [127.0.0.1]) by hub.org (8.10.1/8.10.1) with SMTP id e5GLDaM14477; Fri, 16 Jun 2000 17:13:36 -0400 (EDT) @@ -3032,3 +3032,133 @@ Curt Sampson +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC +From cjs@cynic.net Wed Apr 24 23:19:23 2002 +Return-path: +Received: from angelic.cynic.net ([202.232.117.21]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g3P3JM414917 + for ; Wed, 24 Apr 2002 23:19:22 -0400 (EDT) +Received: from localhost (localhost [127.0.0.1]) + by angelic.cynic.net (Postfix) with ESMTP + id 1F36F870E; Thu, 25 Apr 2002 12:19:14 +0900 (JST) +Date: Thu, 25 Apr 2002 12:19:14 +0900 (JST) +From: Curt Sampson +To: Bruce Momjian +cc: PostgreSQL-development +Subject: Re: Sequential Scan Read-Ahead +In-Reply-To: <200204250156.g3P1ufh05751@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Status: OR + +On Wed, 24 Apr 2002, Bruce Momjian wrote: + +> > 1. Not all systems do readahead. +> +> If they don't, that isn't our problem. We expect it to be there, and if +> it isn't, the vendor/kernel is at fault. + +It is your problem when another database kicks Postgres' ass +performance-wise. + +And at that point, *you're* at fault. You're the one who's knowingly +decided to do things inefficiently. + +Sorry if this sounds harsh, but this, "Oh, someone else is to blame" +attitude gets me steamed. It's one thing to say, "We don't support +this." That's fine; there are often good reasons for that. It's a +completely different thing to say, "It's an unrelated entity's fault we +don't support this." + +At any rate, relying on the kernel to guess how to optimise for +the workload will never work as well as well as the software that +knows the workload doing the optimization. + +The lack of support thing is no joke. Sure, lots of systems nowadays +support unified buffer cache and read-ahead. But how many, besides +Solaris, support free-behind, which is also very important to avoid +blowing out your buffer cache when doing sequential reads? And who +at all supports read-ahead for reverse scans? (Or does Postgres +not do those, anyway? I can see the support is there.) + +And even when the facilities are there, you create problems by +using them. Look at the OS buffer cache, for example. Not only do +we lose efficiency by using two layers of caching, but (as people +have pointed out recently on the lists), the optimizer can't even +know how much or what is being cached, and thus can't make decisions +based on that. + +> Yes, seek() in file will turn off read-ahead. Grabbing bigger chunks +> would help here, but if you have two people already reading from the +> same file, grabbing bigger chunks of the file may not be optimal. + +Grabbing bigger chunks is always optimal, AFICT, if they're not +*too* big and you use the data. A single 64K read takes very little +longer than a single 8K read. + +> > 3. Even when the read-ahead does occur, you're still doing more +> > syscalls, and thus more expensive kernel/userland transitions, than +> > you have to. +> +> I would guess the performance impact is minimal. + +If it were minimal, people wouldn't work so hard to build multi-level +thread systems, where multiple userland threads are scheduled on +top of kernel threads. + +However, it does depend on how much CPU your particular application +is using. You may have it to spare. + +> http://candle.pha.pa.us/mhonarc/todo.detail/performance/msg00009.html + +Well, this message has some points in it that I feel are just incorrect. + + 1. It is *not* true that you have no idea where data is when + using a storage array or other similar system. While you + certainly ought not worry about things such as head positions + and so on, it's been a given for a long, long time that two + blocks that have close index numbers are going to be close + together in physical storage. + + 2. Raw devices are quite standard across Unix systems (except + in the unfortunate case of Linux, which I think has been + remedied, hasn't it?). They're very portable, and have just as + well--if not better--defined write semantics as a filesystem. + + 3. My observations of OS performance tuning over the past six + or eight years contradict the statement, "There's a considerable + cost in complexity and code in using "raw" storage too, and + it's not a one off cost: as the technologies change, the "fast" + way to do things will change and the code will have to be + updated to match." While optimizations have been removed over + the years the basic optimizations (order reads by block number, + do larger reads rather than smaller, cache the data) have + remained unchanged for a long, long time. + + 4. "Better to leave this to the OS vendor where possible, and + take advantage of the tuning they do." Well, sorry guys, but + have a look at the tuning they do. It hasn't changed in years, + except to remove now-unnecessary complexity realated to really, + really old and slow disk devices, and to add a few thing that + guess workload but still do a worse job than if the workload + generator just did its own optimisations in the first place. + +> http://candle.pha.pa.us/mhonarc/todo.detail/optimizer/msg00011.html + +Well, this one, with statements like "Postgres does have control +over its buffer cache," I don't know what to say. You can interpret +the statement however you like, but in the end Postgres very little +control at all over how data is moved between memory and disk. + +BTW, please don't take me as saying that all control over physical +IO should be done by Postgres. I just think that Posgres could do +a better job of managing data transfer between disk and memory than +the OS can. The rest of the things (using raw paritions, read-ahead, +free-behind, etc.) just drop out of that one idea. + +cjs +-- +Curt Sampson +81 90 7737 2974 http://www.netbsd.org + Don't you know, in this new Dark Age, we're all light. --XTC + +