提交 f554af0a 编写于 作者: M Marc G. Fournier

From: t-ishii@sra.co.jp

Hi, here are patches I promised (against 6.3.2):

* character_length(), position(), substring() are now aware of
          multi-byte characters
* add octet_length()
* add --with-mb option to configure
* new regression tests for EUC_KR
  (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
* add some test cases to the EUC_JP regression test
* fix problem in regress/regress.sh in case of System V
* fix toupper(), tolower() to handle 8bit chars

note that:

o  patches for both configure.in and configure are
included. maybe the one for configure is not necessary.

o pg_proc.h was modified to add octet_length(). I used OIDs
(1374-1379) for that. Please let me know if these numbers are not
appropriate.
上级 2cbcf461
postgresql 6.3 multi-byte(MB) patch PL2 README Mar 10 1998 postgresql 6.3 multi-byte (MB) support README April 21 1998
Tatsuo Ishii Tatsuo Ishii
t-ishii@sra.co.jp t-ishii@sra.co.jp
...@@ -6,13 +6,13 @@ postgresql 6.3 multi-byte(MB) patch PL2 README Mar 10 1998 ...@@ -6,13 +6,13 @@ postgresql 6.3 multi-byte(MB) patch PL2 README Mar 10 1998
Introduction Introduction
MB patch is intended for allowing PostgreSQL to handle multi-byte The MB support is intended for allowing PostgreSQL to handle
charachter sets such as EUC(Extende Unix Code), Unicode and Mule multi-byte character sets such as EUC(Extended Unix Code), Unicode and
internal code. With the MB patch you can use multi-byte character sets Mule internal code. With the MB enabled you can use multi-byte
in regexp and LIKE. The encoding system chosen is determined at the character sets in regexp ,LIKE and some functions. The encoding system
compile time. chosen is determined at the compile time.
The patch also fixes some problems concerning with 8-bit single byte MB also fixes some problems concerning with 8-bit single byte
character sets including ISO8859. (I would not say all of problems character sets including ISO8859. (I would not say all of problems
have been fixed. I just confirmed that the regression test ran fine have been fixed. I just confirmed that the regression test ran fine
and a few French characters could be used with the patch. Please let and a few French characters could be used with the patch. Please let
...@@ -20,26 +20,33 @@ me know if you find any problem while using 8-bit characters) ...@@ -20,26 +20,33 @@ me know if you find any problem while using 8-bit characters)
How to use How to use
After applying the MB patch, create src/Makefile.custom with a line create src/Makefile.custom with a line including:
including:
MB=encoding_system MB=encoding_system
or run configure with the mb option:
% configure --with-mb=encoding_system
where encoding_system is one of: where encoding_system is one of:
EUC_JP Japanese EUC EUC_JP Japanese EUC
EUC_CN Chinese EUC EUC_CN Chinese EUC
EUC_KR Korean EUC EUC_KR Korean EUC
EUC_TW Taiwan EUC EUC_TW Taiwan EUC
UNICODE Unicode(UTF-8) UNICODE Unicode(UTF-8)
MULE_INTERNAL Mule internal MULE_INTERNAL Mule internal
Example: Example:
% cat Makefile.custom % cat Makefile.custom
MB=EUC_JP MB=EUC_JP
or
If MB is not defined, nothing is changed except better supporting for % configure --with-mb=EUC_JP
If MB is disabled, nothing is changed except better supporting for
8-bit single byte character sets. 8-bit single byte character sets.
References References
...@@ -59,6 +66,19 @@ Unicode: http://www.unicode.org/ ...@@ -59,6 +66,19 @@ Unicode: http://www.unicode.org/
History History
April 21, 1998 some enhancements/fixes
* character_length(), position(), substring() are now aware of
multi-byte characters
* add octet_length()
* add --with-mb option to configure
* new regression tests for EUC_KR
(contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
* add some test cases to the EUC_JP regression test
* fix problem in regress/regress.sh in case of System V
* fix toupper(), tolower() to handle 8bit chars
Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
Mar 10, 1998 PL2 released Mar 10, 1998 PL2 released
* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL * add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
* add an English document (this file) * add an English document (this file)
......
postgresql 6.3 multi-byte (MB) patch PL2 README 1998/3/10 $B:n@.(B postgresql 6.3.2 multi-byte (MB) support README 1998/4/21 $B:n@.(B
$B@P0fC#IW(B $B@P0fC#IW(B
t-ishii@sra.co.jp t-ishii@sra.co.jp
http://www.sra.co.jp/people/t-ishii/PostgreSQL/ http://www.sra.co.jp/people/t-ishii/PostgreSQL/
$B$O$8$a$K!'(B $B$O$8$a$K!'(B
$B$3$N%Q%C%A$O!"%U%j!<$J(B RDBMS(Relational Database Management System)$B$N(B
PostgreSQL (http://www.postgresql.org/)$B$N:G?7HG(B 6.3 $B$GF|K\8l(B EUC PostgreSQL $B$K$*$1$k%^%k%A%P%$%H%5%]!<%H$O0J2<$N$h$&$JFCD'$r;}$C$F$$$^$9!#(B
$B$J$I!"%^%k%A%P%$%HJ8;z$r07$&$3$H$r2DG=$K$9$k$?$a$N$b$N$G$9!#$3$N%Q%C(B
$B%A$r$"$F$k$3$H$K$h$j!"0J2<$N$3$H$,2DG=$K$J$j$^$9!#(B
1.$B%^%k%A%P%$%HJ8;z$H$7$F!"F|K\8l!"Cf9q8l$J$I$N3F9q$N(B EUC$B!"(BUnicode$B!"(B 1.$B%^%k%A%P%$%HJ8;z$H$7$F!"F|K\8l!"Cf9q8l$J$I$N3F9q$N(B EUC$B!"(BUnicode$B!"(B
mule internal code $B$,%3%s%Q%$%k;~$KA*Br2DG=!#%G!<%?%Y!<%9$K$O(B mule internal code $B$,%3%s%Q%$%k;~$KA*Br2DG=!#%G!<%?%Y!<%9$K$O(B
...@@ -19,45 +17,24 @@ postgresql 6.3 multi-byte (MB) patch PL2 README 1998/3/10 $B:n@.(B ...@@ -19,45 +17,24 @@ postgresql 6.3 multi-byte (MB) patch PL2 README 1998/3/10 $B:n@.(B
4.$B%G!<%?$=$N$b$N$K$b%^%k%A%P%$%HJ8;z$,;HMQ2DG=(B 4.$B%G!<%?$=$N$b$N$K$b%^%k%A%P%$%HJ8;z$,;HMQ2DG=(B
5.$B%^%k%A%P%$%HJ8;z$N@55,I=8=8!:w$,;HMQ2DG=(B 5.$B%^%k%A%P%$%HJ8;z$N@55,I=8=8!:w$,;HMQ2DG=(B
6.$B%^%k%A%P%$%HJ8;z$N(B LIKE $B8!:w$,;HMQ2DG=(B 6.$B%^%k%A%P%$%HJ8;z$N(B LIKE $B8!:w$,;HMQ2DG=(B
7.character_length(), position(), substring() $B$G$N%^%k%A%P%$%H(B
$B%5%]!<%H(B
($B$?$@$7!"(B2,3,4 $B$K$D$$$F$O%Q%C%A$r$"$F$J$/$F$b2DG=$G$9!#(B) $B%$%s%9%H!<%k!'(B
$B%G%U%)%k%H$G$O(B PostgreSQL $B$O%^%k%A%P%$%H$r%5%]!<%H$7$F$$$^$;$s!#(B
postgresql-6.3 $B$NF~<jJ}K!!'(B $B%^%k%A%P%$%H%5%]!<%H$rM-8z$K$9$kJ}K!$r@bL@$7$^$9!#(B
postgresql-6.3.tar.gz $B$O(B postgresql $B$NF|K\$G$N8x<0%_%i!<%5%$%H$G(B
$B$"$k(B ftp://ftp.jaist.ac.jp/pub/dbms/PostgreSQL/ $B$+$iF~<j$G$-$^$9!#(B
$B2?$i$+$NM}M3$G$3$3$+$iF~<j$G$-$J$$>l9g$O!"(B
ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/ $B$bMxMQ$G$-$^$9!#(B
$B$J$*!"(Bpostgresql $B$N%*%j%8%J%k(B ftp $B%5%$%H$O(B ftp://ftp.postgresql.org
$B$G$9!#(B
$B$3$N%Q%C%A$NF~<jJ}K!!'(B
ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/patches/6.3mbPL2.patch.gz
$B$rF~<j$7$F2<$5$$!#(B
$B%Q%C%A$N$"$F$+$?!'(B
$BF~<j$7$?%Q%C%A%U%!%$%k$rE83+$7$^$9!#(B
% gunzip 6.3mbPL2.patch.gz
postgresql-6.3 $B$N%=!<%9$rE83+$7$^$9!#(B
% gtar xfz postgresql-6.3.tar.gz
$B$9$k$H!"(Bpostgresql-6.3 $B$H$$$&%G%#%l%/%H%j$,$G$-$k$N$G!"$=$3$K(B src/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
cd $B$7$^$9!#(B
% cd postgresql-6.3
$B%Q%C%A$rEv$F$^$9!#(B
% patch -p1 < 6.3mbPL2.patch
$B$H$7$F$"$F$F$/$@$5$$!#<!$K!"(Bsrc/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
MB=EUC_JP MB=EUC_JP
$B$N(B 1 $B9T$rDI2C$7$^$9!#(BEUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B $B$N(B 1 $B9T$rDI2C$7$^$9!#$"$k$$$O!"(Bconfigure $B5/F0;~$K0J2<$N$h$&$K;XDj$7$^$9!#(B
% configure --with-mb=EUC_JP
$BJ8;z%3!<%I$H$7$F$O(B EUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B
($B8=:_$N<BAu$G$O!"J8;z%3!<%I$O%3%s%Q%$%k;~$K7hDj$5$l!"<B9T;~$K(B
$BF0E*$KJQ99$9$k$3$H$O$G$-$^$;$s(B)
EUC_JP $BF|K\8l(B EUC EUC_JP $BF|K\8l(B EUC
EUC_CN GB $B$r%Y!<%9$K$7$?CfJ8(BEUC$B!#(Bcode set 2 $B$O(B EUC_CN GB $B$r%Y!<%9$K$7$?CfJ8(BEUC$B!#(Bcode set 2 $B$O(B
...@@ -93,6 +70,22 @@ postgresql-6.3 $B$NF~<jJ}K!!'(B ...@@ -93,6 +70,22 @@ postgresql-6.3 $B$NF~<jJ}K!!'(B
$B2~DjMzNr!'(B $B2~DjMzNr!'(B
1998/4/21 $B5!G=DI2C!?%P%0=$@5(B
* character_length(), position(), substring() $B$N%^%k%A%P%$%H(B
$BBP1~(B
* octet_length() $BDI2C(B $B"*(B initdb $B$N$d$jD>$7I,MW(B
* configure $B$N%*%W%7%g%s$K(B MB $B%5%]!<%HDI2C(B
(ex. configure --with-mb=EUC_JP)
* EUC_KR $B$N(B regression test $BDI2C(B
("Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr> $B$5$sDs6!(B)
* EUC_JP $B$N(B regression test $B$K(B character_length(), position(),
substring(), octet_length() $BDI2C(B
* regress.sh $B$N(B SystemV $B$K$*$1$kHs8_49@-=$@5(B
* toupper(), tolower() $B$K(B 8bit $BJ8;z$,EO$k$HMn$A$k$3$H$,(B
$B$"$k$N$r=$@5(B
1998/3/25 PostgreSQL 6.3.1 $B%j%j!<%9!"(BMB PL2 $B$,<h$j9~$^$l$k(B
1998/3/10 PL2 $B$r%j%j!<%9(B 1998/3/10 PL2 $B$r%j%j!<%9(B
* EUC_JP, EUC_CN, MULE_INTERNAL $B$N(B regression test $B$rDI2C(B * EUC_JP, EUC_CN, MULE_INTERNAL $B$N(B regression test $B$rDI2C(B
(EUC_CN $B$N%G!<%?$O(B he@sra.co.jp $B$5$sDs6!(B) (EUC_CN $B$N%G!<%?$O(B he@sra.co.jp $B$5$sDs6!(B)
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
# #
# #
# IDENTIFICATION # IDENTIFICATION
# $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.40 1998/04/27 14:54:05 scrappy Exp $ # $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.41 1998/04/27 17:07:22 scrappy Exp $
# #
# NOTES # NOTES
# Essentially all Postgres make files include this file and use the # Essentially all Postgres make files include this file and use the
...@@ -147,6 +147,11 @@ X_CFLAGS= @X_CFLAGS@ ...@@ -147,6 +147,11 @@ X_CFLAGS= @X_CFLAGS@
X_LIBS= @X_LIBS@ X_LIBS= @X_LIBS@
X11_LIBS= -lX11 @X_EXTRA_LIBS@ X11_LIBS= -lX11 @X_EXTRA_LIBS@
#
# enable multi-byte support
# choose one of:
# EUC_JP,EHC_CN,EUC_KR,EUC_TW,UNICODE,MULE_INTERNAL
MB=@MB@
############################################################################## ##############################################################################
# #
......
/* /*
* misc conversion functions between pg_wchar and other encodings. * misc conversion functions between pg_wchar and other encodings.
* Tatsuo Ishii * Tatsuo Ishii
* $Id: utils.c,v 1.1 1998/03/15 07:38:39 scrappy Exp $ * $Id: utils.c,v 1.2 1998/04/27 17:07:53 scrappy Exp $
*/ */
#include <regex/pg_wchar.h> #include <regex/pg_wchar.h>
/* /*
...@@ -324,25 +324,151 @@ static void pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int ...@@ -324,25 +324,151 @@ static void pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int
*to = 0; *to = 0;
} }
static int pg_euc_mblen(const unsigned char *s)
{
int len;
if (*s == SS2) {
len = 2;
} else if (*s == SS3) {
len = 3;
} else if (*s & 0x80) {
len = 2;
} else {
len = 1;
}
return(len);
}
static int pg_eucjp_mblen(const unsigned char *s)
{
return(pg_euc_mblen(s));
}
static int pg_euckr_mblen(const unsigned char *s)
{
return(pg_euc_mblen(s));
}
static int pg_eucch_mblen(const unsigned char *s)
{
int len;
if (*s == SS2) {
len = 3;
} else if (*s == SS3) {
len = 3;
} else if (*s & 0x80) {
len = 2;
} else {
len = 1;
}
return(len);
}
static int pg_euccn_mblen(const unsigned char *s)
{
int len;
if (*s == SS2) {
len = 4;
} else if (*s == SS3) {
len = 3;
} else if (*s & 0x80) {
len = 2;
} else {
len = 1;
}
return(len);
}
static int pg_utf_mblen(const unsigned char *s)
{
int len = 1;
if ((*s & 0x80) == 0) {
len = 1;
} else if ((*s & 0xe0) == 0xc0) {
len = 2;
} else if ((*s & 0xe0) == 0xe0) {
len = 3;
}
return(len);
}
static int pg_mule_mblen(const unsigned char *s)
{
int len;
if (IS_LC1(*s)) {
len = 2;
} else if (IS_LCPRV1(*s)) {
len = 3;
} else if (IS_LC2(*s)) {
len = 3;
} else if (IS_LCPRV2(*s)) {
len = 4;
} else { /* assume ASCII */
len = 1;
}
return(len);
}
typedef struct { typedef struct {
void (*mb2wchar)(); void (*mb2wchar)(); /* convert a multi-byte string to a wchar */
void (*mb2wchar_with_len)(); void (*mb2wchar_with_len)(); /* convert a multi-byte string to a wchar
with a limited length */
int (*mblen)(); /* returns the length of a multi-byte word */
} pg_wchar_tbl; } pg_wchar_tbl;
static pg_wchar_tbl pg_wchar_table[] = { static pg_wchar_tbl pg_wchar_table[] = {
{pg_eucjp2wchar, pg_eucjp2wchar_with_len}, {pg_eucjp2wchar, pg_eucjp2wchar_with_len, pg_eucjp_mblen},
{pg_eucch2wchar, pg_eucch2wchar_with_len}, {pg_eucch2wchar, pg_eucch2wchar_with_len, pg_eucch_mblen},
{pg_euckr2wchar, pg_euckr2wchar_with_len}, {pg_euckr2wchar, pg_euckr2wchar_with_len, pg_euckr_mblen},
{pg_euccn2wchar, pg_euccn2wchar_with_len}, {pg_euccn2wchar, pg_euccn2wchar_with_len, pg_euccn_mblen},
{pg_utf2wchar, pg_utf2wchar_with_len}, {pg_utf2wchar, pg_utf2wchar_with_len, pg_utf_mblen},
{pg_mule2wchar, pg_mule2wchar_with_len}}; {pg_mule2wchar, pg_mule2wchar_with_len, pg_mule_mblen}};
/* convert a multi-byte string to a wchar */
void pg_mb2wchar(const unsigned char *from, pg_wchar *to) void pg_mb2wchar(const unsigned char *from, pg_wchar *to)
{ {
(*pg_wchar_table[MB].mb2wchar)(from,to); (*pg_wchar_table[MB].mb2wchar)(from,to);
} }
/* convert a multi-byte string to a wchar with a limited length */
void pg_mb2wchar_with_len(const unsigned char *from, pg_wchar *to, int len) void pg_mb2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
{ {
(*pg_wchar_table[MB].mb2wchar_with_len)(from,to,len); (*pg_wchar_table[MB].mb2wchar_with_len)(from,to,len);
} }
/* returns the byte length of a multi-byte word */
int pg_mblen(const unsigned char *mbstr)
{
return((*pg_wchar_table[MB].mblen)(mbstr));
}
/* returns the length (counted as a wchar) of a multi-byte string */
int pg_mbstrlen(const unsigned char *mbstr)
{
int len = 0;
while (*mbstr) {
mbstr += pg_mblen(mbstr);
len++;
}
return(len);
}
/* returns the length (counted as a wchar) of a multi-byte string
(not necessarily NULL terminated) */
int pg_mbstrlen_with_len(const unsigned char *mbstr, int limit)
{
int len = 0;
int l;
while (*mbstr && limit > 0) {
l = pg_mblen(mbstr);
limit -= l;
mbstr += l;
len++;
}
return(len);
}
/* /*
* Edmund Mergl <E.Mergl@bawue.de> * Edmund Mergl <E.Mergl@bawue.de>
* *
* $Id: oracle_compat.c,v 1.12 1998/02/26 04:37:19 momjian Exp $ * $Id: oracle_compat.c,v 1.13 1998/04/27 17:08:19 scrappy Exp $
* *
*/ */
...@@ -55,7 +55,7 @@ lower(text *string) ...@@ -55,7 +55,7 @@ lower(text *string)
while (m--) while (m--)
{ {
*ptr_ret++ = tolower(*ptr++); *ptr_ret++ = tolower((unsigned char)*ptr++);
} }
return ret; return ret;
...@@ -95,7 +95,7 @@ upper(text *string) ...@@ -95,7 +95,7 @@ upper(text *string)
while (m--) while (m--)
{ {
*ptr_ret++ = toupper(*ptr++); *ptr_ret++ = toupper((unsigned char)*ptr++);
} }
return ret; return ret;
...@@ -135,18 +135,18 @@ initcap(text *string) ...@@ -135,18 +135,18 @@ initcap(text *string)
ptr = VARDATA(string); ptr = VARDATA(string);
ptr_ret = VARDATA(ret); ptr_ret = VARDATA(ret);
*ptr_ret++ = toupper(*ptr++); *ptr_ret++ = toupper((unsigned char)*ptr++);
--m; --m;
while (m--) while (m--)
{ {
if (*(ptr_ret - 1) == ' ' || *(ptr_ret - 1) == ' ') if (*(ptr_ret - 1) == ' ' || *(ptr_ret - 1) == ' ')
{ {
*ptr_ret++ = toupper(*ptr++); *ptr_ret++ = toupper((unsigned char)*ptr++);
} }
else else
{ {
*ptr_ret++ = tolower(*ptr++); *ptr_ret++ = tolower((unsigned char)*ptr++);
} }
} }
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/utils/adt/varchar.c,v 1.29 1998/02/26 04:37:24 momjian Exp $ * $Header: /cvsroot/pgsql/src/backend/utils/adt/varchar.c,v 1.30 1998/04/27 17:08:26 scrappy Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -21,6 +21,8 @@ char *convertstr(char *, int, int); ...@@ -21,6 +21,8 @@ char *convertstr(char *, int, int);
#endif #endif
#include "regex/pg_wchar.h"
/* /*
* CHAR() and VARCHAR() types are part of the ANSI SQL standard. CHAR() * CHAR() and VARCHAR() types are part of the ANSI SQL standard. CHAR()
* is for blank-padded string whose length is specified in CREATE TABLE. * is for blank-padded string whose length is specified in CREATE TABLE.
...@@ -213,6 +215,31 @@ bcTruelen(char *arg) ...@@ -213,6 +215,31 @@ bcTruelen(char *arg)
int32 int32
bpcharlen(char *arg) bpcharlen(char *arg)
{
#ifdef MB
unsigned char *s;
int len, l, wl;
#endif
if (!PointerIsValid(arg))
elog(ERROR, "Bad (null) char() external representation", NULL);
#ifdef MB
l = bcTruelen(arg);
len = 0;
s = VARDATA(arg);
while (l > 0) {
wl = pg_mblen(s);
l -= wl;
s += wl;
len++;
}
return(len);
#else
return (bcTruelen(arg));
#endif
}
int32
bpcharoctetlen(char *arg)
{ {
if (!PointerIsValid(arg)) if (!PointerIsValid(arg))
elog(ERROR, "Bad (null) char() external representation", NULL); elog(ERROR, "Bad (null) char() external representation", NULL);
...@@ -354,9 +381,34 @@ bpcharcmp(char *arg1, char *arg2) ...@@ -354,9 +381,34 @@ bpcharcmp(char *arg1, char *arg2)
int32 int32
varcharlen(char *arg) varcharlen(char *arg)
{ {
#ifdef MB
unsigned char *s;
int len, l, wl;
#endif
if (!PointerIsValid(arg)) if (!PointerIsValid(arg))
elog(ERROR, "Bad (null) varchar() external representation", NULL); elog(ERROR, "Bad (null) varchar() external representation", NULL);
#ifdef MB
len = 0;
s = VARDATA(arg);
l = VARSIZE(arg) - VARHDRSZ;
while (l > 0) {
wl = pg_mblen(s);
l -= wl;
s += wl;
len++;
}
return(len);
#else
return (VARSIZE(arg) - VARHDRSZ);
#endif
}
int32
varcharoctetlen(char *arg)
{
if (!PointerIsValid(arg))
elog(ERROR, "Bad (null) varchar() external representation", NULL);
return (VARSIZE(arg) - VARHDRSZ); return (VARSIZE(arg) - VARHDRSZ);
} }
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
* *
* *
* IDENTIFICATION * IDENTIFICATION
* $Header: /cvsroot/pgsql/src/backend/utils/adt/varlena.c,v 1.32 1998/03/15 08:07:01 scrappy Exp $ * $Header: /cvsroot/pgsql/src/backend/utils/adt/varlena.c,v 1.33 1998/04/27 17:08:28 scrappy Exp $
* *
*------------------------------------------------------------------------- *-------------------------------------------------------------------------
*/ */
...@@ -18,6 +18,8 @@ ...@@ -18,6 +18,8 @@
#include "utils/palloc.h" #include "utils/palloc.h"
#include "utils/builtins.h" /* where function declarations go */ #include "utils/builtins.h" /* where function declarations go */
#include "regex/pg_wchar.h"
/***************************************************************************** /*****************************************************************************
* USER I/O ROUTINES * * USER I/O ROUTINES *
*****************************************************************************/ *****************************************************************************/
...@@ -198,18 +200,52 @@ textout(text *vlena) ...@@ -198,18 +200,52 @@ textout(text *vlena)
/* /*
* textlen - * textlen -
* returns the actual length of a text* * returns the logical length of a text*
* (which is less than the VARSIZE of the text*) * (which is less than the VARSIZE of the text*)
*/ */
int32 int32
textlen(text *t) textlen(text *t)
{ {
#ifdef MB
unsigned char *s;
int len, l, wl;
#endif
if (!PointerIsValid(t)) if (!PointerIsValid(t))
elog(ERROR, "Null input to textlen"); elog(ERROR, "Null input to textlen");
#ifdef MB
len = 0;
s = VARDATA(t);
l = VARSIZE(t) - VARHDRSZ;
while (l > 0) {
wl = pg_mblen(s);
l -= wl;
s += wl;
len++;
}
return(len);
#else
return (VARSIZE(t) - VARHDRSZ); return (VARSIZE(t) - VARHDRSZ);
#endif
} /* textlen() */ } /* textlen() */
/*
* textoctetlen -
* returns the physical length of a text*
* (which is less than the VARSIZE of the text*)
*/
int32
textoctetlen(text *t)
{
if (!PointerIsValid(t))
elog(ERROR, "Null input to textoctetlen");
return (VARSIZE(t) - VARHDRSZ);
} /* textoctetlen() */
/* /*
* textcat - * textcat -
* takes two text* and returns a text* that is the concatentation of * takes two text* and returns a text* that is the concatentation of
...@@ -278,17 +314,27 @@ textcat(text *t1, text *t2) ...@@ -278,17 +314,27 @@ textcat(text *t1, text *t2)
* *
* Note that the arguments operate on octet length, * Note that the arguments operate on octet length,
* so not aware of multi-byte character sets. * so not aware of multi-byte character sets.
*
* Added multi-byte support.
* - Tatsuo Ishii 1998-4-21
*/ */
text * text *
text_substr(text *string, int32 m, int32 n) text_substr(text *string, int32 m, int32 n)
{ {
text *ret; text *ret;
int len; int len;
#ifdef MB
int i;
char *p;
#endif
if ((string == (text *) NULL) || (m <= 0)) if ((string == (text *) NULL) || (m <= 0))
return string; return string;
len = VARSIZE(string) - VARHDRSZ; len = VARSIZE(string) - VARHDRSZ;
#ifdef MB
len = pg_mbstrlen_with_len(VARDATA(string),len);
#endif
/* m will now become a zero-based starting position */ /* m will now become a zero-based starting position */
if (m > len) if (m > len)
...@@ -303,6 +349,17 @@ text_substr(text *string, int32 m, int32 n) ...@@ -303,6 +349,17 @@ text_substr(text *string, int32 m, int32 n)
n = (len - m); n = (len - m);
} }
#ifdef MB
p = VARDATA(string);
for (i=0;i<m;i++) {
p += pg_mblen(p);
}
m = p - VARDATA(string);
for (i=0;i<n;i++) {
p += pg_mblen(p);
}
n = p - (VARDATA(string) + m);
#endif
ret = (text *) palloc(VARHDRSZ + n); ret = (text *) palloc(VARHDRSZ + n);
VARSIZE(ret) = VARHDRSZ + n; VARSIZE(ret) = VARHDRSZ + n;
...@@ -317,6 +374,9 @@ text_substr(text *string, int32 m, int32 n) ...@@ -317,6 +374,9 @@ text_substr(text *string, int32 m, int32 n)
* Implements the SQL92 POSITION() function. * Implements the SQL92 POSITION() function.
* Ref: A Guide To The SQL Standard, Date & Darwen, 1997 * Ref: A Guide To The SQL Standard, Date & Darwen, 1997
* - thomas 1997-07-27 * - thomas 1997-07-27
*
* Added multi-byte support.
* - Tatsuo Ishii 1998-4-21
*/ */
int32 int32
textpos(text *t1, text *t2) textpos(text *t1, text *t2)
...@@ -326,8 +386,11 @@ textpos(text *t1, text *t2) ...@@ -326,8 +386,11 @@ textpos(text *t1, text *t2)
p; p;
int len1, int len1,
len2; len2;
char *p1, pg_wchar *p1,
*p2; *p2;
#ifdef MB
pg_wchar *ps1, *ps2;
#endif
if (!PointerIsValid(t1) || !PointerIsValid(t2)) if (!PointerIsValid(t1) || !PointerIsValid(t2))
return (0); return (0);
...@@ -337,19 +400,36 @@ textpos(text *t1, text *t2) ...@@ -337,19 +400,36 @@ textpos(text *t1, text *t2)
len1 = (VARSIZE(t1) - VARHDRSZ); len1 = (VARSIZE(t1) - VARHDRSZ);
len2 = (VARSIZE(t2) - VARHDRSZ); len2 = (VARSIZE(t2) - VARHDRSZ);
#ifdef MB
ps1 = p1 = (pg_wchar *) palloc((len1 + 1)*sizeof(pg_wchar));
(void)pg_mb2wchar_with_len((unsigned char *)VARDATA(t1),p1,len1);
len1 = pg_wchar_strlen(p1);
ps2 = p2 = (pg_wchar *) palloc((len2 + 1)*sizeof(pg_wchar));
(void)pg_mb2wchar_with_len((unsigned char *)VARDATA(t2),p2,len2);
len2 = pg_wchar_strlen(p2);
#else
p1 = VARDATA(t1); p1 = VARDATA(t1);
p2 = VARDATA(t2); p2 = VARDATA(t2);
#endif
pos = 0; pos = 0;
px = (len1 - len2); px = (len1 - len2);
for (p = 0; p <= px; p++) for (p = 0; p <= px; p++)
{ {
#ifdef MB
if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
#else
if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0)) if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
#endif
{ {
pos = p + 1; pos = p + 1;
break; break;
}; };
p1++; p1++;
}; };
#ifdef MB
pfree(ps1);
pfree(ps2);
#endif
return (pos); return (pos);
} /* textpos() */ } /* textpos() */
......
此差异已折叠。
...@@ -199,6 +199,24 @@ AC_ARG_ENABLE( ...@@ -199,6 +199,24 @@ AC_ARG_ENABLE(
AC_MSG_RESULT(disabled) AC_MSG_RESULT(disabled)
) )
AC_MSG_CHECKING(setting MB)
AC_ARG_WITH(mb,
[ --with-mb=<encoding> enable multi-byte support ],
[
case "$withval" in
EUC_JP|EHC_CN|EUC_KR|EUC_TW|UNICODE|MULE_INTERNAL)
MB="$withval";
AC_MSG_RESULT("enabled with $withval")
;;
*)
AC_MSG_ERROR([*** You must supply an argument to the --with-mb option one of EUC_JP,EHC_CN,EUC_KR,EUC_TW,UNICODE,MULE_INTERNAL])
;;
esac
MB="$withval"
],
AC_MSG_RESULT("disabled")
)
dnl We use the default value of 5432 for the DEF_PGPORT value. If dnl We use the default value of 5432 for the DEF_PGPORT value. If
dnl we over-ride it with --with-pgport=port then we bypass this piece dnl we over-ride it with --with-pgport=port then we bypass this piece
AC_MSG_CHECKING(setting DEF_PGPORT) AC_MSG_CHECKING(setting DEF_PGPORT)
...@@ -305,6 +323,7 @@ AC_SUBST(DLSUFFIX) ...@@ -305,6 +323,7 @@ AC_SUBST(DLSUFFIX)
AC_SUBST(DL_LIB) AC_SUBST(DL_LIB)
AC_SUBST(USE_TCL) AC_SUBST(USE_TCL)
AC_SUBST(USE_PERL) AC_SUBST(USE_PERL)
AC_SUBST(MB)
dnl **************************************************************** dnl ****************************************************************
dnl Hold off on the C++ stuff until we can figure out why it doesn't dnl Hold off on the C++ stuff until we can figure out why it doesn't
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* *
* Copyright (c) 1994, Regents of the University of California * Copyright (c) 1994, Regents of the University of California
* *
* $Id: pg_proc.h,v 1.53 1998/04/27 04:08:07 momjian Exp $ * $Id: pg_proc.h,v 1.54 1998/04/27 17:08:41 scrappy Exp $
* *
* NOTES * NOTES
* The script catalog/genbki.sh reads this file and generates .bki * The script catalog/genbki.sh reads this file and generates .bki
...@@ -201,6 +201,8 @@ DATA(insert OID = 1257 ( textlen PGUID 11 f t f 1 f 23 "25" 100 0 1 0 foo ...@@ -201,6 +201,8 @@ DATA(insert OID = 1257 ( textlen PGUID 11 f t f 1 f 23 "25" 100 0 1 0 foo
DESCR("length"); DESCR("length");
DATA(insert OID = 1258 ( textcat PGUID 11 f t f 2 f 25 "25 25" 100 0 1 0 foo bar )); DATA(insert OID = 1258 ( textcat PGUID 11 f t f 2 f 25 "25 25" 100 0 1 0 foo bar ));
DESCR("concat"); DESCR("concat");
DATA(insert OID = 1377 ( textoctetlen PGUID 11 f t f 1 f 23 "25" 100 0 1 0 foo bar ));
DESCR("octet length");
DATA(insert OID = 84 ( boolne PGUID 11 f t f 2 f 16 "16 16" 100 0 0 100 foo bar )); DATA(insert OID = 84 ( boolne PGUID 11 f t f 2 f 16 "16 16" 100 0 0 100 foo bar ));
DESCR("not equal"); DESCR("not equal");
...@@ -1444,7 +1446,11 @@ DESCR("does not match regex., case-insensitive"); ...@@ -1444,7 +1446,11 @@ DESCR("does not match regex., case-insensitive");
DATA(insert OID = 1251 ( bpcharlen PGUID 11 f t f 1 f 23 "1042" 100 0 0 100 foo bar )); DATA(insert OID = 1251 ( bpcharlen PGUID 11 f t f 1 f 23 "1042" 100 0 0 100 foo bar ));
DESCR("octet length"); DESCR("octet length");
DATA(insert OID = 1378 ( bpcharoctetlen PGUID 11 f t f 1 f 23 "1042" 100 0 0 100 foo bar ));
DESCR("octet length");
DATA(insert OID = 1253 ( varcharlen PGUID 11 f t f 1 f 23 "1043" 100 0 0 100 foo bar )); DATA(insert OID = 1253 ( varcharlen PGUID 11 f t f 1 f 23 "1043" 100 0 0 100 foo bar ));
DESCR("character length");
DATA(insert OID = 1379 ( varcharoctetlen PGUID 11 f t f 1 f 23 "1043" 100 0 0 100 foo bar ));
DESCR("octet length"); DESCR("octet length");
DATA(insert OID = 1263 ( text_timespan PGUID 11 f t f 1 f 1186 "25" 100 0 0 100 foo bar )); DATA(insert OID = 1263 ( text_timespan PGUID 11 f t f 1 f 1186 "25" 100 0 0 100 foo bar ));
...@@ -1550,10 +1556,17 @@ DESCR("convert"); ...@@ -1550,10 +1556,17 @@ DESCR("convert");
DATA(insert OID = 1370 ( timestamp PGUID 14 f t f 1 f 1296 "1184" 100 0 0 100 "select datetime_stamp($1)" - )); DATA(insert OID = 1370 ( timestamp PGUID 14 f t f 1 f 1296 "1184" 100 0 0 100 "select datetime_stamp($1)" - ));
DESCR("convert"); DESCR("convert");
DATA(insert OID = 1371 ( length PGUID 14 f t f 1 f 23 "25" 100 0 0 100 "select textlen($1)" - )); DATA(insert OID = 1371 ( length PGUID 14 f t f 1 f 23 "25" 100 0 0 100 "select textlen($1)" - ));
DESCR("octet length"); DESCR("character length");
DATA(insert OID = 1372 ( length PGUID 14 f t f 1 f 23 "1042" 100 0 0 100 "select bpcharlen($1)" - )); DATA(insert OID = 1372 ( length PGUID 14 f t f 1 f 23 "1042" 100 0 0 100 "select bpcharlen($1)" - ));
DESCR("octet length"); DESCR("character length");
DATA(insert OID = 1373 ( length PGUID 14 f t f 1 f 23 "1043" 100 0 0 100 "select varcharlen($1)" - )); DATA(insert OID = 1373 ( length PGUID 14 f t f 1 f 23 "1043" 100 0 0 100 "select varcharlen($1)" - ));
DESCR("character length");
DATA(insert OID = 1374 ( octet_length PGUID 14 f t f 1 f 23 "25" 100 0 0 100 "select textoctetlen($1)" - ));
DESCR("octet length");
DATA(insert OID = 1375 ( octet_length PGUID 14 f t f 1 f 23 "1042" 100 0 0 100 "select bpcharoctetlen($1)" - ));
DESCR("octet length");
DATA(insert OID = 1376 ( octet_length PGUID 14 f t f 1 f 23 "1043" 100 0 0 100 "select varcharoctetlen($1)" - ));
DESCR("octet length"); DESCR("octet length");
DATA(insert OID = 1380 ( date_part PGUID 14 f t f 2 f 701 "25 1184" 100 0 0 100 "select datetime_part($1, $2)" - )); DATA(insert OID = 1380 ( date_part PGUID 14 f t f 2 f 701 "25 1184" 100 0 0 100 "select datetime_part($1, $2)" - ));
......
/* $Id: pg_wchar.h,v 1.1 1998/03/15 07:38:47 scrappy Exp $ */ /* $Id: pg_wchar.h,v 1.2 1998/04/27 17:09:12 scrappy Exp $ */
#ifndef PG_WCHAR_H #ifndef PG_WCHAR_H
#define PG_WCHAR_H #define PG_WCHAR_H
...@@ -39,6 +39,9 @@ extern int pg_char_and_wchar_strcmp(const char *, const pg_wchar *); ...@@ -39,6 +39,9 @@ extern int pg_char_and_wchar_strcmp(const char *, const pg_wchar *);
extern int pg_wchar_strncmp(const pg_wchar *, const pg_wchar *, size_t); extern int pg_wchar_strncmp(const pg_wchar *, const pg_wchar *, size_t);
extern int pg_char_and_wchar_strncmp(const char *, const pg_wchar *, size_t); extern int pg_char_and_wchar_strncmp(const char *, const pg_wchar *, size_t);
extern size_t pg_wchar_strlen(const pg_wchar *); extern size_t pg_wchar_strlen(const pg_wchar *);
extern int pg_mblen(const unsigned char *);
extern int pg_mbstrlen(const unsigned char *);
extern int pg_mbstrlen_with_len(const unsigned char *, int);
#endif #endif
#endif #endif
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* *
* Copyright (c) 1994, Regents of the University of California * Copyright (c) 1994, Regents of the University of California
* *
* $Id: builtins.h,v 1.40 1998/04/26 04:09:25 momjian Exp $ * $Id: builtins.h,v 1.41 1998/04/27 17:09:28 scrappy Exp $
* *
* NOTES * NOTES
* This should normally only be included by fmgr.h. * This should normally only be included by fmgr.h.
...@@ -400,6 +400,7 @@ extern bool bpchargt(char *arg1, char *arg2); ...@@ -400,6 +400,7 @@ extern bool bpchargt(char *arg1, char *arg2);
extern bool bpcharge(char *arg1, char *arg2); extern bool bpcharge(char *arg1, char *arg2);
extern int32 bpcharcmp(char *arg1, char *arg2); extern int32 bpcharcmp(char *arg1, char *arg2);
extern int32 bpcharlen(char *arg); extern int32 bpcharlen(char *arg);
extern int32 bpcharoctetlen(char *arg);
extern uint32 hashbpchar(struct varlena * key); extern uint32 hashbpchar(struct varlena * key);
extern char *varcharin(char *s, int dummy, int16 atttypmod); extern char *varcharin(char *s, int dummy, int16 atttypmod);
...@@ -412,6 +413,7 @@ extern bool varchargt(char *arg1, char *arg2); ...@@ -412,6 +413,7 @@ extern bool varchargt(char *arg1, char *arg2);
extern bool varcharge(char *arg1, char *arg2); extern bool varcharge(char *arg1, char *arg2);
extern int32 varcharcmp(char *arg1, char *arg2); extern int32 varcharcmp(char *arg1, char *arg2);
extern int32 varcharlen(char *arg); extern int32 varcharlen(char *arg);
extern int32 varcharoctetlen(char *arg);
extern uint32 hashvarchar(struct varlena * key); extern uint32 hashvarchar(struct varlena * key);
/* varlena.c */ /* varlena.c */
...@@ -425,6 +427,7 @@ extern bool text_le(text *arg1, text *arg2); ...@@ -425,6 +427,7 @@ extern bool text_le(text *arg1, text *arg2);
extern bool text_gt(text *arg1, text *arg2); extern bool text_gt(text *arg1, text *arg2);
extern bool text_ge(text *arg1, text *arg2); extern bool text_ge(text *arg1, text *arg2);
extern int32 textlen(text *arg); extern int32 textlen(text *arg);
extern int32 textoctetlen(text *arg);
extern int32 textpos(text *arg1, text *arg2); extern int32 textpos(text *arg1, text *arg2);
extern text *text_substr(text *string, int32 m, int32 n); extern text *text_substr(text *string, int32 m, int32 n);
......
...@@ -53,3 +53,35 @@ QUERY: select * from ...@@ -53,3 +53,35 @@ QUERY: select * from
コンピュータグラフィックス|分B10中 | コンピュータグラフィックス|分B10中 |
(2 rows) (2 rows)
QUERY: select *,character_length(用語) from 計算機用語;
用語 |分類コード|備考1aだよ|length
--------------------------+----------+----------+------
コンピュータディスプレイ |機A01上 | | 12
コンピュータグラフィックス|分B10中 | | 13
コンピュータプログラマー |人Z01下 | | 12
(3 rows)
QUERY: select *,octet_length(用語) from 計算機用語;
用語 |分類コード|備考1aだよ|octet_length
--------------------------+----------+----------+------------
コンピュータディスプレイ |機A01上 | | 24
コンピュータグラフィックス|分B10中 | | 26
コンピュータプログラマー |人Z01下 | | 24
(3 rows)
QUERY: select *,position('デ' in 用語) from 計算機用語;
用語 |分類コード|備考1aだよ|strpos
--------------------------+----------+----------+------
コンピュータディスプレイ |機A01上 | | 7
コンピュータグラフィックス|分B10中 | | 0
コンピュータプログラマー |人Z01下 | | 0
(3 rows)
QUERY: select *,substring(用語 from 10 for 4) from 計算機用語;
用語 |分類コード|備考1aだよ|substr
--------------------------+----------+----------+--------
コンピュータディスプレイ |機A01上 | |プレイ
コンピュータグラフィックス|分B10中 | |ィックス
コンピュータプログラマー |人Z01下 | |ラマー
(3 rows)
#!/bin/sh #!/bin/sh
# $Header: /cvsroot/pgsql/src/test/regress/Attic/regress.sh,v 1.18 1998/03/15 07:39:04 scrappy Exp $ # $Header: /cvsroot/pgsql/src/test/regress/Attic/regress.sh,v 1.19 1998/04/27 17:10:17 scrappy Exp $
# #
if echo '\c' | grep -s c >/dev/null 2>&1 if echo '\c' | grep -s c >/dev/null 2>&1
then then
...@@ -43,7 +43,7 @@ fi ...@@ -43,7 +43,7 @@ fi
echo "=============== running regression queries... =================" echo "=============== running regression queries... ================="
echo "" > regression.diffs echo "" > regression.diffs
if [ a$MB != a ];then if [ a$MB != a ];then
mbtests=`echo $MB|tr A-Z a-z` mbtests=`echo $MB|tr "[A-Z]" "[a-z]"`
else else
mbtests="" mbtests=""
fi fi
......
...@@ -13,3 +13,7 @@ select * from ...@@ -13,3 +13,7 @@ select * from
select * from 計算機用語 where 分類コード like '_Z%'; select * from 計算機用語 where 分類コード like '_Z%';
select * from 計算機用語 where 用語 ~ 'コンピュータ[デグ]'; select * from 計算機用語 where 用語 ~ 'コンピュータ[デグ]';
select * from 計算機用語 where 用語 ~* 'コンピュータ[デグ]'; select * from 計算機用語 where 用語 ~* 'コンピュータ[デグ]';
select *,character_length(用語) from 計算機用語;
select *,octet_length(用語) from 計算機用語;
select *,position('デ' in 用語) from 計算機用語;
select *,substring(用語 from 10 for 4) from 計算機用語;
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册