Linus Tolke Michael Meskes 1996-1997 Linus Tolke 1998 Michael Meskes Transcribed 1998-02-12 <application>ecpg</application> - Embedded <acronym>SQL</acronym> in <acronym>C</acronym> This describes an embedded SQL in C package for Postgres. It is written by Linus Tolke and Michael Meskes. Permission is granted to copy and use in the same way as you are allowed to copy and use the rest of PostgreSQL. Why Embedded <acronym>SQL</acronym>? Embedded SQL has some small advantages over other ways to handle SQL queries. It takes care of all the tedious moving of information to and from variables in your C program. Many RDBMS packages support this embedded language. There is an ANSI-standard describing how the embedded language should work. ecpg was designed to meet this standard as much as possible. So it is possible to port programs with embedded SQL written for other RDBMS packages to Postgres and thus promoting the spirit of free software. The Concept You write your program in C with some special SQL things. For declaring variables that can be used in SQL statements you need to put them in a special declare section. You use a special syntax for the SQL queries. Before compiling you run the file through the embedded SQL C preprocessor and it converts the SQL statements you used to function calls with the variables used as arguments. Both variables that are used as input to the SQL statements and variables that will contain the result are passed. Then you compile and at link time you link with a special library that contains the functions used. These functions (actually it is mostly one single function) fetches the information from the arguments, performs the SQL query using the ordinary interface (libpq) and puts back the result in the arguments dedicated for output. Then you run your program and when the control arrives to the SQL statement the SQL statement is performed against the database and you can continue with the result. How To Use <application>ecpg</application> This section describes how to use the ecpg tool. Preprocessor The preprocessor is called ecpg. After installation it resides in the Postgres bin/ directory. Library The ecpg library is called libecpg.a or libecpg.so. Additionally, the library uses the libpq library for communication to the Postgres server so you will have to link your program with -lecpg -lpq. The library has some methods that are "hidden" but that could prove very useful sometime. ECPGdebug(int on, FILE *stream) turns on debug logging if called with the first argument non-zero. Debug logging is done on stream. Most SQL statement logs its arguments and result. The most important one (ECPGdo) that is called on almost all SQL statements logs both its expanded string, i.e. the string with all the input variables inserted, and the result from the Postgres server. This can be very useful when searching for errors in your SQL statements. ECPGstatus() This method returns TRUE if we are connected to a database and FALSE if not. Error handling To be able to detect errors from the Postgres server you include a line like exec sql include sqlca; in the include section of your file. This will define a struct and a variable with the name sqlca as following: struct sqlca { char sqlcaid[8]; long sqlabc; long sqlcode; struct { int sqlerrml; char sqlerrmc[70]; } sqlerrm; char sqlerrp[8]; long sqlerrd[6]; /* 0: empty */ /* 1: OID of processed tuple if applicable */ /* 2: number of rows processed in an INSERT, UPDATE */ /* or DELETE statement */ /* 3: empty */ /* 4: empty */ /* 5: empty */ char sqlwarn[8]; /* 0: set to 'W' if at least one other is 'W' */ /* 1: if 'W' at least one character string */ /* value was truncated when it was */ /* stored into a host variable. */ /* 2: empty */ /* 3: empty */ /* 4: empty */ /* 5: empty */ /* 6: empty */ /* 7: empty */ char sqlext[8]; } sqlca; If an error occured in the last SQL statement then sqlca.sqlcode will be non-zero. If sqlca.sqlcode is less that 0 then this is some kind of serious error, like the database definition does not match the query given. If it is bigger than 0 then this is a normal error like the table did not contain the requested row. sqlca.sqlerrm.sqlerrmc will contain a string that describes the error. The string ends with the line number in the source file. List of errors that can occur: -12, Out of memory in line %d. Does not normally occur. This is a sign that your virtual memory is exhausted. -200, Unsupported type %s on line %d. Does not normally occur. This is a sign that the preprocessor has generated something that the library does not know about. Perhaps you are running incompatible versions of the preprocessor and the library. -201, Too many arguments line %d. This means that Postgres has returned more arguments than we have matching variables. Perhaps you have forgotten a couple of the host variables in the INTO :var1,:var2-list. -202, Too few arguments line %d. This means that Postgres has returned fewer arguments than we have host variables. Perhaps you have too many host variables in the INTO :var1,:var2-list. -203, Too many matches line %d. This means that the query has returned several lines but the variables specified are no arrays. The SELECT you made probably was not unique. -204, Not correctly formatted int type: %s line %d. This means that the host variable is of an int type and the field in the Postgres database is of another type and contains a value that cannot be interpreted as an int. The library uses strtol for this conversion. -205, Not correctly formatted unsigned type: %s line %d. This means that the host variable is of an unsigned int type and the field in the Postgres database is of another type and contains a value that cannot be interpreted as an unsigned int. The library uses strtoul for this conversion. -206, Not correctly formatted floating point type: %s line %d. This means that the host variable is of a float type and the field in the Postgres database is of another type and contains a value that cannot be interpreted as an float. The library uses strtod for this conversion. -207, Unable to convert %s to bool on line %d. This means that the host variable is of a bool type and the field in the Postgres database is neither 't' nor 'f'. -208, Empty query line %d. Postgres returned PGRES_EMPTY_QUERY, probably because the query indeed was empty. -220, No such connection %s in line %d. The program tries to access a connection that does not exist. -221, Not connected in line %d. The program tries to access a connection that does exist but is not open. -230, Invalid statement name %s in line %d. The statement you are trying to use has not been prepared. -400, Postgres error: %s line %d. Some Postgres error. The message contains the error message from the Postgres backend. -401, Error in transaction processing line %d. Postgres signalled to us that we cannot start, commit or rollback the transaction. -402, connect: could not open database %s. The connect to the database did not work. 100, Data not found line %d. This is a "normal" error that tells you that what you are quering cannot be found or we have gone through the cursor. Limitations What will never be included and why or what cannot be done with this concept. Oracle's single tasking possibility Oracle version 7.0 on AIX 3 uses the OS-supported locks on the shared memory segments and allows the application designer to link an application in a so called single tasking way. Instead of starting one client process per application process both the database part and the application part is run in the same process. In later versions of Oracle this is no longer supported. This would require a total redesign of the Postgres access model and that effort can not justify the performance gained. Porting From Other <acronym>RDBMS</acronym> Packages The design of ecpg follows SQL standard. So porting from a standard RDBMS should not be a problem. Unfortunately there is no such thing as a standard RDBMS. So ecpg also tries to understand syntax additions as long as they do not create conflicts with the standard. The following list shows all the known incompatibilities. If you find one not listed please notify Michael Meskes. Note, however, that we list only incompatibilities from a precompiler of another RDBMS to ecpg and not additional ecpg features that these RDBMS do not have. Syntax of FETCH command The standard syntax of the FETCH command is: FETCH [direction] [amount] IN|FROM cursor name. ORACLE, however, does not use the keywords IN resp. FROM. This feature cannot be added since it would create parsing conflicts. Installation Since version 0.5 ecpg is distributed together with Postgres. So you should get your precompiler, libraries and header files compiled and installed by default as a part of your installation. For the Developer This section is for those who want to develop the ecpg interface. It describes how the things work. The ambition is to make this section contain things for those that want to have a look inside and the section on How to use it should be enough for all normal questions. So, read this before looking at the internals of the ecpg. If you are not interested in how it really works, skip this section. ToDo List This version the preprocessor has some flaws: Library functions to_date et al. do not exists. But then Postgres has some good conversion routines itself. So you probably won't miss these. Structures ans unions Structures and unions have to be defined in the declare section. Missing statements The following statements are not implemented thus far: exec sql allocate exec sql deallocate SQLSTATE message 'no data found' The error message for "no data" in an exec sql insert select from statement has to be 100. sqlwarn[6] sqlwarn[6] should be 'W' if the PRECISION or SCALE value specified in a SET DESCRIPTOR statement will be ignored. The Preprocessor The first four lines written to the output are constant additions by ecpg. These are two comments and two include lines necessary for the interface to the library. Then the preprocessor works in one pass only, reading the input file and writing to the output as it goes along. Normally it just echoes everything to the output without looking at it further. When it comes to an EXEC SQL statements it intervenes and changes them depending on what it is. The EXEC SQL statement can be one of these: Declare sections Declare sections begins with exec sql begin declare section; and ends with exec sql end declare section; In the section only variable declarations are allowed. Every variable declare within this section is also entered in a list of variables indexed on their name together with the corresponding type. In particular the definition of a structure or union also has to be listed inside a declare section. Otherwise ecpg cannot handle these types since it simply does not know the definition. The declaration is echoed to the file to make the variable a normal C-variable also. The special types VARCHAR and VARCHAR2 are converted into a named struct for every variable. A declaration like: VARCHAR var[180]; is converted into struct varchar_var { int len; char arr[180]; } var; Include statements An include statement looks like: exec sql include filename; Note that this is NOT the same as #include <filename.h> Instead the file specified is parsed by ecpg itself. So the contents of the specified file is included in the resulting C code. This way you are able to specify EXEC SQL commands in an include file. Connect statement A connect statement looks like: exec sql connect to connection target; It creates a connection to the specified database. The connection target can be specified in the following ways: dbname[@server][:port][as connection name][user user name] tcp:postgresql://server[:port][/dbname][as connection name][user user name] unix:postgresql://server[:port][/dbname][as connection name][user user name] character variable[as connection name][user user name] character string[as connection name][user] default user There are also different ways to specify the user name: userid userid/password userid identified by password userid using password Finally the userid and the password. Each may be a constant text, a character variable or a chararcter string. Disconnect statements A disconnect statement looks loke: exec sql disconnect [connection target]; It closes the connection to the specified database. The connection target can be specified in the following ways: connection name default current all Open cursor statement An open cursor statement looks like: exec sql open cursor; and is ignore and not copied from the output. Commit statement A commit statement looks like exec sql commit; and is translated on the output to ECPGcommit(__LINE__); Rollback statement A rollback statement looks like exec sql rollback; and is translated on the output to ECPGrollback(__LINE__); Other statements Other SQL statements are other statements that start with exec sql and ends with ;. Everything inbetween is treated as an SQL statement and parsed for variable substitution. Variable substitution occur when a symbol starts with a colon (:). Then a variable with that name is looked for among the variables that were previously declared within a declare section and depending on the variable being for input or output the pointers to the variables are written to the output to allow for access by the function. For every variable that is part of the SQL request the function gets another ten arguments: The type as a special symbol. A pointer to the value or a pointer to the pointer. The size of the variable if it is a char or varchar. Number of elements in the array (for array fetches). The offset to the next element in the array (for array fetches) The type of the indicator variable as a special symbol. A pointer to the value of the indicator variable or a pointer to the pointer of the indicator variable. 0. Number of elements in the indicator array (for array fetches). The offset to the next element in the indicator array (for array fetches) A Complete Example Here is a complete example describing the output of the preprocessor of a file foo.pgc: exec sql begin declare section; int index; int result; exec sql end declare section; ... exec sql select res into :result from mytable where index = :index; is translated into: /* Processed by ecpg (2.6.0) */ /* These two include files are added by the preprocessor */ #include <ecpgtype.h>; #include <ecpglib.h>; /* exec sql begin declare section */ #line 1 "foo.pgc" int index; int result; /* exec sql end declare section */ ... ECPGdo(__LINE__, NULL, "select res from mytable where index = ? ", ECPGt_int,&(index),1L,1L,sizeof(int), ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EOIT, ECPGt_int,&(result),1L,1L,sizeof(int), ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EORT); #line 147 "foo.pgc" (the indentation in this manual is added for readability and not something that the preprocessor can do.) The Library The most important function in the library is the ECPGdo function. It takes a variable amount of arguments. Hopefully we will not run into machines with limits on the amount of variables that can be accepted by a vararg function. This could easily add up to 50 or so arguments. The arguments are: A line number This is a line number for the original line used in error messages only. A string This is the SQL request that is to be issued. This request is modified by the input variables, i.e. the variables that where not known at compile time but are to be entered in the request. Where the variables should go the string contains ";". Input variables As described in the section about the preprocessor every input variable gets ten arguments. ECPGt_EOIT An enum telling that there are no more input variables. Output variables As described in the section about the preprocessor every input variable gets ten arguments. These variables are filled by the function. ECPGt_EORT An enum telling that there are no more variables. All the SQL statements are performed in one transaction unless you issue a commit transaction. To get this auto-transaction going the first statement or the first after statement after a commit or rollback always begins a transaction. To disable this feature per default use the option on the commandline. To be completed: entries describing the other entries.