From: t-ishii@sra.co.jp

Hi, here are patches I promised (against 6.3.2): * character_length(), position(), substring() are now aware of multi-byte characters * add octet_length() * add --with-mb option to configure * new regression tests for EUC_KR (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>) * add some test cases to the EUC_JP regression test * fix problem in regress/regress.sh in case of System V * fix toupper(), tolower() to handle 8bit chars note that: o patches for both configure.in and configure are included. maybe the one for configure is not necessary. o pg_proc.h was modified to add octet_length(). I used OIDs (1374-1379) for that. Please let me know if these numbers are not appropriate.

From: t-ishii@sra.co.jp
Hi, here are patches I promised (against 6.3.2): * character_length(), position(), substring() are now aware of multi-byte characters * add octet_length() * add --with-mb option to configure * new regression tests for EUC_KR (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>) * add some test cases to the EUC_JP regression test * fix problem in regress/regress.sh in case of System V * fix toupper(), tolower() to handle 8bit chars note that: o patches for both configure.in and configure are included. maybe the one for configure is not necessary. o pg_proc.h was modified to add octet_length(). I used OIDs (1374-1379) for that. Please let me know if these numbers are not appropriate.
f554af0a · Marc G. Fournier · 2cbcf461 · f554af0a · f554af0a · f554af0a
15 changed file
--- a/doc/README.mb
+++ b/doc/README.mb
-postgresql 6.3 multi-byte(MB) patch PL2 README	  Mar 10 1998
+postgresql 6.3 multi-byte (MB) support README	  April 21 1998
 						Tatsuo Ishii
 						t-ishii@sra.co.jp
@@ -6,13 +6,13 @@ postgresql 6.3 multi-byte(MB) patch PL2 README	  Mar 10 1998
 Introduction
-MB patch is intended for allowing PostgreSQL to handle multi-byte
+The MB support is intended for allowing PostgreSQL to handle
-charachter sets such as EUC(Extende Unix Code), Unicode and Mule
+multi-byte character sets such as EUC(Extended Unix Code), Unicode and
-internal code. With the MB patch you can use multi-byte character sets
+Mule internal code. With the MB enabled you can use multi-byte
-in regexp and LIKE. The encoding system chosen is determined at the
+character sets in regexp ,LIKE and some functions. The encoding system
-compile time.
+chosen is determined at the compile time.
-The patch also fixes some problems concerning with 8-bit single byte
+MB also fixes some problems concerning with 8-bit single byte
 character sets including ISO8859. (I would not say all of problems
 have been fixed. I just confirmed that the regression test ran fine
 and a few French characters could be used with the patch. Please let
@@ -20,26 +20,33 @@ me know if you find any problem while using 8-bit characters)
 How to use
-After applying the MB patch, create src/Makefile.custom with a line
+create src/Makefile.custom with a line including:
-including:
-MB=encoding_system
+	MB=encoding_system
+or run configure with the mb option:
+	% configure --with-mb=encoding_system
 where encoding_system is one of:
-EUC_JP			Japanese EUC
+	EUC_JP			Japanese EUC
-EUC_CN			Chinese EUC
+	EUC_CN			Chinese EUC
-EUC_KR			Korean EUC
+	EUC_KR			Korean EUC
-EUC_TW			Taiwan EUC
+	EUC_TW			Taiwan EUC
-UNICODE			Unicode(UTF-8)
+	UNICODE			Unicode(UTF-8)
-MULE_INTERNAL		Mule internal
+	MULE_INTERNAL		Mule internal
 Example:
-% cat Makefile.custom
+	% cat Makefile.custom
-MB=EUC_JP
+	MB=EUC_JP
+	or
-If MB is not defined, nothing is changed except better supporting for
+	% configure --with-mb=EUC_JP
+If MB is disabled, nothing is changed except better supporting for
 8-bit single byte character sets.
 References
@@ -59,6 +66,19 @@ Unicode: http://www.unicode.org/
 History
+April 21, 1998 some enhancements/fixes
+	* character_length(), position(), substring() are now aware of 
+	  multi-byte characters
+	* add octet_length()
+	* add --with-mb option to configure
+	* new regression tests for EUC_KR
+  	  (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
+	* add some test cases to the EUC_JP regression test
+	* fix problem in regress/regress.sh in case of System V
+	* fix toupper(), tolower() to handle 8bit chars
+Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
 Mar 10, 1998 PL2 released
 	* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
 	* add an English document (this file)

--- a/doc/README.mb.jp
+++ b/doc/README.mb.jp
-postgresql 6.3 multi-byte (MB) patch PL2 README	       1998/3/10 $B:n@.(B
+postgresql 6.3.2 multi-byte (MB) support README	       1998/4/21 $B:n@.(B
 							$B@P0fC#IW(B
 						t-ishii@sra.co.jp
 		  http://www.sra.co.jp/people/t-ishii/PostgreSQL/
 $B$O$8$a$K!'(B
-  $B$3$N%Q%C%A$O!"%U%j!<$J(B RDBMS(Relational Database Management System)$B$N(B
-  PostgreSQL (http://www.postgresql.org/)$B$N:G?7HG(B 6.3 $B$GF|K\8l(B EUC
+  PostgreSQL $B$K$*$1$k%^%k%A%P%$%H%5%]!<%H$O0J2<$N$h$&$JFCD'$r;}$C$F$$$^$9!#(B
-  $B$J$I!"%^%k%A%P%$%HJ8;z$r07$&$3$H$r2DG=$K$9$k$?$a$N$b$N$G$9!#$3$N%Q%C(B
-  $B%A$r$"$F$k$3$H$K$h$j!"0J2<$N$3$H$,2DG=$K$J$j$^$9!#(B
    1.$B%^%k%A%P%$%HJ8;z$H$7$F!"F|K\8l!"Cf9q8l$J$I$N3F9q$N(B EUC$B!"(BUnicode$B!"(B
      mule internal code $B$,%3%s%Q%$%k;~$KA*Br2DG=!#%G!<%?%Y!<%9$K$O(B
@@ -19,45 +17,24 @@ postgresql 6.3 multi-byte (MB) patch PL2 README	       1998/3/10 $B:n@.(B
    4.$B%G!<%?$=$N$b$N$K$b%^%k%A%P%$%HJ8;z$,;HMQ2DG=(B
    5.$B%^%k%A%P%$%HJ8;z$N@55,I=8=8!:w$,;HMQ2DG=(B
    6.$B%^%k%A%P%$%HJ8;z$N(B LIKE $B8!:w$,;HMQ2DG=(B
+    7.character_length(), position(), substring() $B$G$N%^%k%A%P%$%H(B
+      $B%5%]!<%H(B
-    ($B$?$@$7!"(B2,3,4 $B$K$D$$$F$O%Q%C%A$r$"$F$J$/$F$b2DG=$G$9!#(B)
+$B%$%s%9%H!<%k!'(B
+  $B%G%U%)%k%H$G$O(B PostgreSQL $B$O%^%k%A%P%$%H$r%5%]!<%H$7$F$$$^$;$s!#(B
-postgresql-6.3 $B$NF~<jJ}K!!'(B
+  $B%^%k%A%P%$%H%5%]!<%H$rM-8z$K$9$kJ}K!$r@bL@$7$^$9!#(B
-  postgresql-6.3.tar.gz $B$O(B postgresql $B$NF|K\$G$N8x<0%_%i!<%5%$%H$G(B
-  $B$"$k(B ftp://ftp.jaist.ac.jp/pub/dbms/PostgreSQL/ $B$+$iF~<j$G$-$^$9!#(B
-  $B2?$i$+$NM}M3$G$3$3$+$iF~<j$G$-$J$$>l9g$O!"(B
-  ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/ $B$bMxMQ$G$-$^$9!#(B
-  $B$J$*!"(Bpostgresql $B$N%*%j%8%J%k(B ftp $B%5%$%H$O(B ftp://ftp.postgresql.org
-  $B$G$9!#(B
-$B$3$N%Q%C%A$NF~<jJ}K!!'(B
-  ftp://ftp.sra.co.jp/pub/cmd/postgres/6.3/patches/6.3mbPL2.patch.gz 
-  $B$rF~<j$7$F2<$5$$!#(B
-$B%Q%C%A$N$"$F$+$?!'(B
-  $BF~<j$7$?%Q%C%A%U%!%$%k$rE83+$7$^$9!#(B
-	% gunzip 6.3mbPL2.patch.gz
-  postgresql-6.3 $B$N%=!<%9$rE83+$7$^$9!#(B
-	% gtar xfz postgresql-6.3.tar.gz
-  $B$9$k$H!"(Bpostgresql-6.3 $B$H$$$&%G%#%l%/%H%j$,$G$-$k$N$G!"$=$3$K(B
+  src/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
-  cd $B$7$^$9!#(B
-	% cd postgresql-6.3
-  $B%Q%C%A$rEv$F$^$9!#(B
-	% patch -p1 < 6.3mbPL2.patch 
-  $B$H$7$F$"$F$F$/$@$5$$!#<!$K!"(Bsrc/Makefile.custom $B$H$$$&%U%!%$%k$r:n$j!"(B
 	MB=EUC_JP
-  $B$N(B 1 $B9T$rDI2C$7$^$9!#(BEUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B
+  $B$N(B 1 $B9T$rDI2C$7$^$9!#$"$k$$$O!"(Bconfigure $B5/F0;~$K0J2<$N$h$&$K;XDj$7$^$9!#(B
+  % configure --with-mb=EUC_JP
+  $BJ8;z%3!<%I$H$7$F$O(B EUC_JP $B$r4^$a!"0J2<$N%3!<%I$,;XDj$G$-$^$9!#(B
+  ($B8=:_$N<BAu$G$O!"J8;z%3!<%I$O%3%s%Q%$%k;~$K7hDj$5$l!"<B9T;~$K(B
+   $BF0E*$KJQ99$9$k$3$H$O$G$-$^$;$s(B)
 	EUC_JP		$BF|K\8l(B EUC
 	EUC_CN		GB $B$r%Y!<%9$K$7$?CfJ8(BEUC$B!#(Bcode set 2 $B$O(B
@@ -93,6 +70,22 @@ postgresql-6.3 $B$NF~<jJ}K!!'(B
 $B2~DjMzNr!'(B
+  1998/4/21 $B5!G=DI2C!?%P%0=$@5(B
+	* character_length(), position(), substring() $B$N%^%k%A%P%$%H(B
+	  $BBP1~(B
+	* octet_length() $BDI2C(B $B"*(B initdb $B$N$d$jD>$7I,MW(B
+	* configure $B$N%*%W%7%g%s$K(B MB $B%5%]!<%HDI2C(B
+	  (ex. configure --with-mb=EUC_JP)
+	* EUC_KR $B$N(B regression test $BDI2C(B
+	  ("Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr> $B$5$sDs6!(B)
+	* EUC_JP $B$N(B regression test $B$K(B character_length(), position(),
+	  substring(), octet_length() $BDI2C(B
+	* regress.sh $B$N(B SystemV $B$K$*$1$kHs8_49@-=$@5(B
+	* toupper(), tolower() $B$K(B 8bit $BJ8;z$,EO$k$HMn$A$k$3$H$,(B
+	  $B$"$k$N$r=$@5(B
+  1998/3/25 PostgreSQL 6.3.1 $B%j%j!<%9!"(BMB PL2 $B$,<h$j9~$^$l$k(B
  1998/3/10 PL2 $B$r%j%j!<%9(B
 	* EUC_JP, EUC_CN, MULE_INTERNAL $B$N(B regression test $B$rDI2C(B
 	  (EUC_CN $B$N%G!<%?$O(B he@sra.co.jp $B$5$sDs6!(B)

--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -7,7 +7,7 @@
 #
 #
 # IDENTIFICATION
-#    $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.40 1998/04/27 14:54:05 scrappy Exp $
+#    $Header: /cvsroot/pgsql/src/Makefile.global.in,v 1.41 1998/04/27 17:07:22 scrappy Exp $
 #
 # NOTES
 #    Essentially all Postgres make files include this file and use the 
@@ -147,6 +147,11 @@ X_CFLAGS= @X_CFLAGS@
 X_LIBS= @X_LIBS@
 X11_LIBS= -lX11 @X_EXTRA_LIBS@
+#
+# enable multi-byte support
+# choose one of:
+# EUC_JP,EHC_CN,EUC_KR,EUC_TW,UNICODE,MULE_INTERNAL
+MB=@MB@
 ##############################################################################
 #

--- a/src/backend/regex/utils.c
+++ b/src/backend/regex/utils.c
 /*
 * misc conversion functions between pg_wchar and other encodings.
 * Tatsuo Ishii
- * $Id: utils.c,v 1.1 1998/03/15 07:38:39 scrappy Exp $
+ * $Id: utils.c,v 1.2 1998/04/27 17:07:53 scrappy Exp $
 */
 #include <regex/pg_wchar.h>
 /*
@@ -324,25 +324,151 @@ static void pg_mule2wchar_with_len(const unsigned char *from, pg_wchar *to, int
  *to = 0;
 }
+static int pg_euc_mblen(const unsigned char *s)
+{
+  int len;
+  if (*s == SS2) {
+    len = 2;
+  } else if (*s == SS3) {
+    len = 3;
+  } else if (*s & 0x80) {
+    len = 2;
+  } else {
+    len = 1;
+  }
+  return(len);
+}
+static int pg_eucjp_mblen(const unsigned char *s)
+{
+  return(pg_euc_mblen(s));
+}
+static int pg_euckr_mblen(const unsigned char *s)
+{
+  return(pg_euc_mblen(s));
+}
+static int pg_eucch_mblen(const unsigned char *s)
+{
+  int len;
+  if (*s == SS2) {
+    len = 3;
+  } else if (*s == SS3) {
+    len = 3;
+  } else if (*s & 0x80) {
+    len = 2;
+  } else {
+    len = 1;
+  }
+  return(len);
+}
+static int pg_euccn_mblen(const unsigned char *s)
+{
+  int len;
+  if (*s == SS2) {
+    len = 4;
+  } else if (*s == SS3) {
+    len = 3;
+  } else if (*s & 0x80) {
+    len = 2;
+  } else {
+    len = 1;
+  }
+  return(len);
+}
+static int pg_utf_mblen(const unsigned char *s)
+{
+  int len = 1;
+  if ((*s & 0x80) == 0) {
+    len = 1;
+  } else if ((*s & 0xe0) == 0xc0) {
+    len = 2;
+  } else if ((*s & 0xe0) == 0xe0) {
+    len = 3;
+  }
+  return(len);
+}
+static int pg_mule_mblen(const unsigned char *s)
+{
+  int len;
+  if (IS_LC1(*s)) {
+    len = 2;
+  } else if (IS_LCPRV1(*s)) {
+    len = 3;
+  } else if (IS_LC2(*s)) {
+    len = 3;
+  } else if (IS_LCPRV2(*s)) {
+    len = 4;
+  } else {	/* assume ASCII */
+    len = 1;
+  }
+  return(len);
+}
 typedef struct {
-  void	(*mb2wchar)();
+  void	(*mb2wchar)();		/* convert a multi-byte string to a wchar */
-  void	(*mb2wchar_with_len)();
+  void	(*mb2wchar_with_len)();	/* convert a multi-byte string to a wchar 
+				   with a limited length */
+  int	(*mblen)();		/* returns the length of a multi-byte word */
 } pg_wchar_tbl;
 static pg_wchar_tbl pg_wchar_table[] = {
-  {pg_eucjp2wchar, pg_eucjp2wchar_with_len},
+  {pg_eucjp2wchar, pg_eucjp2wchar_with_len, pg_eucjp_mblen},
-  {pg_eucch2wchar, pg_eucch2wchar_with_len},
+  {pg_eucch2wchar, pg_eucch2wchar_with_len, pg_eucch_mblen},
-  {pg_euckr2wchar, pg_euckr2wchar_with_len},
+  {pg_euckr2wchar, pg_euckr2wchar_with_len, pg_euckr_mblen},
-  {pg_euccn2wchar, pg_euccn2wchar_with_len},
+  {pg_euccn2wchar, pg_euccn2wchar_with_len, pg_euccn_mblen},
-  {pg_utf2wchar, pg_utf2wchar_with_len},
+  {pg_utf2wchar, pg_utf2wchar_with_len, pg_utf_mblen},
-  {pg_mule2wchar, pg_mule2wchar_with_len}};
+  {pg_mule2wchar, pg_mule2wchar_with_len, pg_mule_mblen}};
+/* convert a multi-byte string to a wchar */
 void pg_mb2wchar(const unsigned char *from, pg_wchar *to)
 {
  (*pg_wchar_table[MB].mb2wchar)(from,to);
 }
+/* convert a multi-byte string to a wchar with a limited length */
 void pg_mb2wchar_with_len(const unsigned char *from, pg_wchar *to, int len)
 {
  (*pg_wchar_table[MB].mb2wchar_with_len)(from,to,len);
 }
+/* returns the byte length of a multi-byte word */
+int pg_mblen(const unsigned char *mbstr)
+{
+  return((*pg_wchar_table[MB].mblen)(mbstr));
+}
+/* returns the length (counted as a wchar) of a multi-byte string */
+int pg_mbstrlen(const unsigned char *mbstr)
+{
+  int len = 0;
+  while (*mbstr) {
+    mbstr += pg_mblen(mbstr);
+    len++;
+  }
+  return(len);
+}
+/* returns the length (counted as a wchar) of a multi-byte string 
+   (not necessarily  NULL terminated) */
+int pg_mbstrlen_with_len(const unsigned char *mbstr, int limit)
+{
+  int len = 0;
+  int l;
+  while (*mbstr && limit > 0) {
+    l = pg_mblen(mbstr);
+    limit -= l;
+    mbstr += l;
+    len++;
+  }
+  return(len);
+}
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
 /*
 *	Edmund Mergl <E.Mergl@bawue.de>
 *
- *	$Id: oracle_compat.c,v 1.12 1998/02/26 04:37:19 momjian Exp $
+ *	$Id: oracle_compat.c,v 1.13 1998/04/27 17:08:19 scrappy Exp $
 *
 */
@@ -55,7 +55,7 @@ lower(text *string)
 	while (m--)
 	{
-		*ptr_ret++ = tolower(*ptr++);
+		*ptr_ret++ = tolower((unsigned char)*ptr++);
 	}
 	return ret;
@@ -95,7 +95,7 @@ upper(text *string)
 	while (m--)
 	{
-		*ptr_ret++ = toupper(*ptr++);
+		*ptr_ret++ = toupper((unsigned char)*ptr++);
 	}
 	return ret;
@@ -135,18 +135,18 @@ initcap(text *string)
 	ptr = VARDATA(string);
 	ptr_ret = VARDATA(ret);
-	*ptr_ret++ = toupper(*ptr++);
+	*ptr_ret++ = toupper((unsigned char)*ptr++);
 	--m;
 	while (m--)
 	{
 		if (*(ptr_ret - 1) == ' ' || *(ptr_ret - 1) == '	')
 		{
-			*ptr_ret++ = toupper(*ptr++);
+			*ptr_ret++ = toupper((unsigned char)*ptr++);
 		}
 		else
 		{
-			*ptr_ret++ = tolower(*ptr++);
+			*ptr_ret++ = tolower((unsigned char)*ptr++);
 		}
 	}

--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -7,7 +7,7 @@
 *
 *
 * IDENTIFICATION
- *	  $Header: /cvsroot/pgsql/src/backend/utils/adt/varchar.c,v 1.29 1998/02/26 04:37:24 momjian Exp $
+ *	  $Header: /cvsroot/pgsql/src/backend/utils/adt/varchar.c,v 1.30 1998/04/27 17:08:26 scrappy Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -21,6 +21,8 @@ char	   *convertstr(char *, int, int);
 #endif
+#include "regex/pg_wchar.h"
 /*
 * CHAR() and VARCHAR() types are part of the ANSI SQL standard. CHAR()
 * is for blank-padded string whose length is specified in CREATE TABLE.
@@ -213,6 +215,31 @@ bcTruelen(char *arg)
 int32
 bpcharlen(char *arg)
+{
+#ifdef MB
+	unsigned char *s;
+	int len, l, wl;
+#endif
+	if (!PointerIsValid(arg))
+		elog(ERROR, "Bad (null) char() external representation", NULL);
+#ifdef MB
+	l = bcTruelen(arg);
+	len = 0;
+	s = VARDATA(arg);
+	while (l > 0) {
+	  wl = pg_mblen(s);
+	  l -= wl;
+	  s += wl;
+	  len++;
+	}
+	return(len);
+#else
+	return (bcTruelen(arg));
+#endif
+}
+int32
+bpcharoctetlen(char *arg)
 {
 	if (!PointerIsValid(arg))
 		elog(ERROR, "Bad (null) char() external representation", NULL);
@@ -354,9 +381,34 @@ bpcharcmp(char *arg1, char *arg2)
 int32
 varcharlen(char *arg)
 {
+#ifdef MB
+	unsigned char *s;
+	int len, l, wl;
+#endif
 	if (!PointerIsValid(arg))
 		elog(ERROR, "Bad (null) varchar() external representation", NULL);
+#ifdef MB
+	len = 0;
+	s = VARDATA(arg);
+	l = VARSIZE(arg) - VARHDRSZ;
+	while (l > 0) {
+	  wl = pg_mblen(s);
+	  l -= wl;
+	  s += wl;
+	  len++;
+	}
+	return(len);
+#else
+	return (VARSIZE(arg) - VARHDRSZ);
+#endif
+}
+int32
+varcharoctetlen(char *arg)
+{
+	if (!PointerIsValid(arg))
+		elog(ERROR, "Bad (null) varchar() external representation", NULL);
 	return (VARSIZE(arg) - VARHDRSZ);
 }

--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -7,7 +7,7 @@
 *
 *
 * IDENTIFICATION
- *	  $Header: /cvsroot/pgsql/src/backend/utils/adt/varlena.c,v 1.32 1998/03/15 08:07:01 scrappy Exp $
+ *	  $Header: /cvsroot/pgsql/src/backend/utils/adt/varlena.c,v 1.33 1998/04/27 17:08:28 scrappy Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -18,6 +18,8 @@
 #include "utils/palloc.h"
 #include "utils/builtins.h"		/* where function declarations go */
+#include "regex/pg_wchar.h"
 /*****************************************************************************
 *	 USER I/O ROUTINES														 *
 *****************************************************************************/
@@ -198,18 +200,52 @@ textout(text *vlena)
 /*
 * textlen -
- *	  returns the actual length of a text*
+ *	  returns the logical length of a text*
 *	   (which is less than the VARSIZE of the text*)
 */
 int32
 textlen(text *t)
 {
+#ifdef MB
+	unsigned char *s;
+	int len, l, wl;
+#endif
 	if (!PointerIsValid(t))
 		elog(ERROR, "Null input to textlen");
+#ifdef MB
+	len = 0;
+	s = VARDATA(t);
+	l = VARSIZE(t) - VARHDRSZ;
+	while (l > 0) {
+	  wl = pg_mblen(s);
+	  l -= wl;
+	  s += wl;
+	  len++;
+	}
+	return(len);
+#else
 	return (VARSIZE(t) - VARHDRSZ);
+#endif
 }	/* textlen() */
+/*
+ * textoctetlen -
+ *	  returns the physical length of a text*
+ *	   (which is less than the VARSIZE of the text*)
+ */
+int32
+textoctetlen(text *t)
+{
+	if (!PointerIsValid(t))
+		elog(ERROR, "Null input to textoctetlen");
+	return (VARSIZE(t) - VARHDRSZ);
+}	/* textoctetlen() */
 /*
 * textcat -
 *	  takes two text* and returns a text* that is the concatentation of
@@ -278,17 +314,27 @@ textcat(text *t1, text *t2)
 *
 * Note that the arguments operate on octet length,
 *	so not aware of multi-byte character sets.
+ *
+ * Added multi-byte support.
+ * - Tatsuo Ishii 1998-4-21
 */
 text *
 text_substr(text *string, int32 m, int32 n)
 {
 	text	   *ret;
 	int			len;
+#ifdef MB
+	int i;
+	char *p;
+#endif
 	if ((string == (text *) NULL) || (m <= 0))
 		return string;
 	len = VARSIZE(string) - VARHDRSZ;
+#ifdef MB
+	len = pg_mbstrlen_with_len(VARDATA(string),len);
+#endif
 	/* m will now become a zero-based starting position */
 	if (m > len)
@@ -303,6 +349,17 @@ text_substr(text *string, int32 m, int32 n)
 			n = (len - m);
 	}
+#ifdef MB
+	p = VARDATA(string);
+	for (i=0;i<m;i++) {
+	  p += pg_mblen(p);
+	}
+	m = p - VARDATA(string);
+	for (i=0;i<n;i++) {
+	  p += pg_mblen(p);
+	}
+	n = p - (VARDATA(string) + m);
+#endif
 	ret = (text *) palloc(VARHDRSZ + n);
 	VARSIZE(ret) = VARHDRSZ + n;
@@ -317,6 +374,9 @@ text_substr(text *string, int32 m, int32 n)
 *	  Implements the SQL92 POSITION() function.
 *	  Ref: A Guide To The SQL Standard, Date & Darwen, 1997
 * - thomas 1997-07-27
+ *
+ * Added multi-byte support.
+ * - Tatsuo Ishii 1998-4-21
 */
 int32
 textpos(text *t1, text *t2)
@@ -326,8 +386,11 @@ textpos(text *t1, text *t2)
 				p;
 	int			len1,
 				len2;
-	char	   *p1,
+	pg_wchar	   *p1,
 			   *p2;
+#ifdef MB
+	pg_wchar	*ps1, *ps2;
+#endif
 	if (!PointerIsValid(t1) || !PointerIsValid(t2))
 		return (0);
@@ -337,19 +400,36 @@ textpos(text *t1, text *t2)
 	len1 = (VARSIZE(t1) - VARHDRSZ);
 	len2 = (VARSIZE(t2) - VARHDRSZ);
+#ifdef MB
+	ps1 = p1 = (pg_wchar *) palloc((len1 + 1)*sizeof(pg_wchar));
+	(void)pg_mb2wchar_with_len((unsigned char *)VARDATA(t1),p1,len1);
+	len1 = pg_wchar_strlen(p1);
+	ps2 = p2 = (pg_wchar *) palloc((len2 + 1)*sizeof(pg_wchar));
+	(void)pg_mb2wchar_with_len((unsigned char *)VARDATA(t2),p2,len2);
+	len2 = pg_wchar_strlen(p2);
+#else
 	p1 = VARDATA(t1);
 	p2 = VARDATA(t2);
+#endif
 	pos = 0;
 	px = (len1 - len2);
 	for (p = 0; p <= px; p++)
 	{
+#ifdef MB
+		if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0))
+#else
 		if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0))
+#endif
 		{
 			pos = p + 1;
 			break;
 		};
 		p1++;
 	};
+#ifdef MB
+	pfree(ps1);
+	pfree(ps2);
+#endif
 	return (pos);
 }	/* textpos() */

--- a/src/configure
+++ b/src/configure
--- a/src/configure.in
+++ b/src/configure.in
@@ -199,6 +199,24 @@ AC_ARG_ENABLE(
   AC_MSG_RESULT(disabled)
 )
+AC_MSG_CHECKING(setting MB)
+AC_ARG_WITH(mb,
+    [  --with-mb=<encoding> enable multi-byte support ], 
+    [
+	case "$withval" in
+	EUC_JP|EHC_CN|EUC_KR|EUC_TW|UNICODE|MULE_INTERNAL)
+            MB="$withval";
+	    AC_MSG_RESULT("enabled with $withval")
+            ;;
+	*)
+	    AC_MSG_ERROR([*** You must supply an argument to the --with-mb option one of EUC_JP,EHC_CN,EUC_KR,EUC_TW,UNICODE,MULE_INTERNAL])
+	  ;;
+	esac
+	MB="$withval"
+    ],
+    AC_MSG_RESULT("disabled")
+)
 dnl We use the default value of 5432 for the DEF_PGPORT value.	If
 dnl we over-ride it with --with-pgport=port then we bypass this piece
 AC_MSG_CHECKING(setting DEF_PGPORT)
@@ -305,6 +323,7 @@ AC_SUBST(DLSUFFIX)
 AC_SUBST(DL_LIB)
 AC_SUBST(USE_TCL)
 AC_SUBST(USE_PERL)
+AC_SUBST(MB)
 dnl ****************************************************************
 dnl Hold off on the C++ stuff until we can figure out why it doesn't 

--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -6,7 +6,7 @@
 *
 * Copyright (c) 1994, Regents of the University of California
 *
- * $Id: pg_proc.h,v 1.53 1998/04/27 04:08:07 momjian Exp $
+ * $Id: pg_proc.h,v 1.54 1998/04/27 17:08:41 scrappy Exp $
 *
 * NOTES
 *	  The script catalog/genbki.sh reads this file and generates .bki
@@ -201,6 +201,8 @@ DATA(insert OID = 1257 (  textlen		   PGUID 11 f t f 1 f 23 "25" 100 0 1 0  foo
 DESCR("length");
 DATA(insert OID = 1258 (  textcat		   PGUID 11 f t f 2 f 25 "25 25" 100 0 1 0	foo bar ));
 DESCR("concat");
+DATA(insert OID = 1377 (  textoctetlen		   PGUID 11 f t f 1 f 23 "25" 100 0 1 0  foo bar ));
+DESCR("octet length");
 DATA(insert OID =  84 (  boolne			   PGUID 11 f t f 2 f 16 "16 16" 100 0 0 100  foo bar ));
 DESCR("not equal");
@@ -1444,7 +1446,11 @@ DESCR("does not match regex., case-insensitive");
 DATA(insert OID = 1251 (  bpcharlen		   PGUID 11 f t f 1 f 23 "1042" 100 0 0 100  foo bar ));
 DESCR("octet length");
+DATA(insert OID = 1378 (  bpcharoctetlen		   PGUID 11 f t f 1 f 23 "1042" 100 0 0 100  foo bar ));
+DESCR("octet length");
 DATA(insert OID = 1253 (  varcharlen	   PGUID 11 f t f 1 f 23 "1043" 100 0 0 100  foo bar ));
+DESCR("character length");
+DATA(insert OID = 1379 (  varcharoctetlen	   PGUID 11 f t f 1 f 23 "1043" 100 0 0 100  foo bar ));
 DESCR("octet length");
 DATA(insert OID = 1263 (  text_timespan    PGUID 11 f t f 1 f 1186 "25" 100 0 0 100  foo bar ));
@@ -1550,10 +1556,17 @@ DESCR("convert");
 DATA(insert OID = 1370 (  timestamp			 PGUID 14 f t f 1 f 1296 "1184" 100 0 0 100  "select datetime_stamp($1)" - ));
 DESCR("convert");
 DATA(insert OID = 1371 (  length			 PGUID 14 f t f 1 f   23   "25" 100 0 0 100  "select textlen($1)" - ));
-DESCR("octet length");
+DESCR("character length");
 DATA(insert OID = 1372 (  length			 PGUID 14 f t f 1 f   23   "1042" 100 0 0 100  "select bpcharlen($1)" - ));
-DESCR("octet length");
+DESCR("character length");
 DATA(insert OID = 1373 (  length			 PGUID 14 f t f 1 f   23   "1043" 100 0 0 100  "select varcharlen($1)" - ));
+DESCR("character length");
+DATA(insert OID = 1374 (  octet_length			 PGUID 14 f t f 1 f   23   "25" 100 0 0 100  "select textoctetlen($1)" - ));
+DESCR("octet length");
+DATA(insert OID = 1375 (  octet_length			 PGUID 14 f t f 1 f   23   "1042" 100 0 0 100  "select bpcharoctetlen($1)" - ));
+DESCR("octet length");
+DATA(insert OID = 1376 (  octet_length			 PGUID 14 f t f 1 f   23   "1043" 100 0 0 100  "select varcharoctetlen($1)" - ));
 DESCR("octet length");
 DATA(insert OID = 1380 (  date_part    PGUID 14 f t f 2 f  701 "25 1184" 100 0 0 100  "select datetime_part($1, $2)" - ));

--- a/src/include/regex/pg_wchar.h
+++ b/src/include/regex/pg_wchar.h
-/* $Id: pg_wchar.h,v 1.1 1998/03/15 07:38:47 scrappy Exp $ */
+/* $Id: pg_wchar.h,v 1.2 1998/04/27 17:09:12 scrappy Exp $ */
 #ifndef PG_WCHAR_H
 #define PG_WCHAR_H
@@ -39,6 +39,9 @@ extern int pg_char_and_wchar_strcmp(const char *, const pg_wchar *);
 extern int pg_wchar_strncmp(const pg_wchar *, const pg_wchar *, size_t);
 extern int pg_char_and_wchar_strncmp(const char *, const pg_wchar *, size_t);
 extern size_t pg_wchar_strlen(const pg_wchar *);
+extern int pg_mblen(const unsigned char *);
+extern int pg_mbstrlen(const unsigned char *);
+extern int pg_mbstrlen_with_len(const unsigned char *, int);
 #endif
 #endif
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -6,7 +6,7 @@
 *
 * Copyright (c) 1994, Regents of the University of California
 *
- * $Id: builtins.h,v 1.40 1998/04/26 04:09:25 momjian Exp $
+ * $Id: builtins.h,v 1.41 1998/04/27 17:09:28 scrappy Exp $
 *
 * NOTES
 *	  This should normally only be included by fmgr.h.
@@ -400,6 +400,7 @@ extern bool bpchargt(char *arg1, char *arg2);
 extern bool bpcharge(char *arg1, char *arg2);
 extern int32 bpcharcmp(char *arg1, char *arg2);
 extern int32 bpcharlen(char *arg);
+extern int32 bpcharoctetlen(char *arg);
 extern uint32 hashbpchar(struct varlena * key);
 extern char *varcharin(char *s, int dummy, int16 atttypmod);
@@ -412,6 +413,7 @@ extern bool varchargt(char *arg1, char *arg2);
 extern bool varcharge(char *arg1, char *arg2);
 extern int32 varcharcmp(char *arg1, char *arg2);
 extern int32 varcharlen(char *arg);
+extern int32 varcharoctetlen(char *arg);
 extern uint32 hashvarchar(struct varlena * key);
 /* varlena.c */
@@ -425,6 +427,7 @@ extern bool text_le(text *arg1, text *arg2);
 extern bool text_gt(text *arg1, text *arg2);
 extern bool text_ge(text *arg1, text *arg2);
 extern int32 textlen(text *arg);
+extern int32 textoctetlen(text *arg);
 extern int32 textpos(text *arg1, text *arg2);
 extern text *text_substr(text *string, int32 m, int32 n);

--- a/src/test/regress/expected/euc_jp.out
+++ b/src/test/regress/expected/euc_jp.out
@@ -53,3 +53,35 @@ QUERY: select * from 
 コンピュータグラフィックス|分B10中   |          
 (2 rows)
+QUERY: select *,character_length(用語) from 計算機用語;
+用語                      |分類コード|備考1aだよ|length
+--------------------------+----------+----------+------
+コンピュータディスプレイ  |機A01上   |          |    12
+コンピュータグラフィックス|分B10中   |          |    13
+コンピュータプログラマー  |人Z01下   |          |    12
+(3 rows)
+QUERY: select *,octet_length(用語) from 計算機用語;
+用語                      |分類コード|備考1aだよ|octet_length
+--------------------------+----------+----------+------------
+コンピュータディスプレイ  |機A01上   |          |          24
+コンピュータグラフィックス|分B10中   |          |          26
+コンピュータプログラマー  |人Z01下   |          |          24
+(3 rows)
+QUERY: select *,position('デ' in 用語) from 計算機用語;
+用語                      |分類コード|備考1aだよ|strpos
+--------------------------+----------+----------+------
+コンピュータディスプレイ  |機A01上   |          |     7
+コンピュータグラフィックス|分B10中   |          |     0
+コンピュータプログラマー  |人Z01下   |          |     0
+(3 rows)
+QUERY: select *,substring(用語 from 10 for 4) from 計算機用語;
+用語                      |分類コード|備考1aだよ|substr  
+--------------------------+----------+----------+--------
+コンピュータディスプレイ  |機A01上   |          |プレイ  
+コンピュータグラフィックス|分B10中   |          |ィックス
+コンピュータプログラマー  |人Z01下   |          |ラマー  
+(3 rows)
--- a/src/test/regress/regress.sh
+++ b/src/test/regress/regress.sh
 #!/bin/sh
-# $Header: /cvsroot/pgsql/src/test/regress/Attic/regress.sh,v 1.18 1998/03/15 07:39:04 scrappy Exp $
+# $Header: /cvsroot/pgsql/src/test/regress/Attic/regress.sh,v 1.19 1998/04/27 17:10:17 scrappy Exp $
 #
 if echo '\c' | grep -s c >/dev/null 2>&1
 then
@@ -43,7 +43,7 @@ fi
 echo "=============== running regression queries...         ================="
 echo "" > regression.diffs
 if [ a$MB != a ];then
-	mbtests=`echo $MB|tr A-Z a-z`
+	mbtests=`echo $MB|tr "[A-Z]" "[a-z]"`
 else
 	mbtests=""
 fi

--- a/src/test/regress/sql/euc_jp.sql
+++ b/src/test/regress/sql/euc_jp.sql
@@ -13,3 +13,7 @@ select * from 
 select * from 計算機用語 where 分類コード like '_Z%';
 select * from 計算機用語 where 用語 ~ 'コンピュータ[デグ]';
 select * from 計算機用語 where 用語 ~* 'コンピュータ[デグ]';
+select *,character_length(用語) from 計算機用語;
+select *,octet_length(用語) from 計算機用語;
+select *,position('デ' in 用語) from 計算機用語;
+select *,substring(用語 from 10 for 4) from 計算機用語;