Purchase | Copyright © 2002 Paul Sheer. Click here for copying permissions. | Home |
This chapter will show you how to set up an SQL server for free.
Structured Query Language (SQL) is a programming language developed specifically to access data arranged in tables of rows and columns--as in a database--as well as do searching, sorting and cross-referencing of that data.
Typically, the database tables will sit in files managed by an SQL server daemon process. The SQL server will listen on a TCP socket for incoming requests from client machines and will service those requests.
SQL has become a de facto industry standard. However, the protocols (over TCP/IP) by which those SQL requests are sent are different from implementation to implementation.
SQL requests can usually be typed in manually from a command-line interface. This is difficult for most users, so a GUI interface will usually hide this process from the user.
SQL servers and SQL support software is major institution. Management of database tables is actually a complicated affair. A good SQL server will properly streamline multiple simultaneous requests that may access and modify rows in the same table. Doing this efficiently, along with the many types of complex searches and cross-referencing, while also ensuring data integrity, is a complex task.
postgres (PostGreSQL) is a free SQL server written under the BSD license. postgres supports an extended subset of SQL92. [The definitive SQL standard.] It does a lot of very nifty things that no other database can (it seems). About the only commercial equivalent worth buying over postgres is a certain very expensive industry leader. postgres runs on every flavor of UNIX and also on Windows NT.
The postgres documentation proudly states:
The Object-Relational Database Management System now known as PostgreSQL (and briefly called Postgres95) is derived from the Postgres package written at Berkeley. With over a decade of development behind it, PostgreSQL is the most advanced open-source database available anywhere, offering multi-version concurrency control, supporting almost all SQL constructs (including subselects, transactions, and user-defined types and functions), and having a wide range of language bindings available (including C, C++, Java, Perl, Tcl, and Python).
postgres is also fairly dry. Most people ask why it doesn't have a graphical frontend. Considering that it runs on so many different platforms, it makes sense for it to be purely a back-end engine. A graphical interface is a different kind of software project that would probably support more than one type of database server at the back and possibly run under only one kind of graphical interface.
The postgres package consists of the files described in the next two sections:
The postgres packages consists of the user programs
|
createdb dropdb pg_dump psql createlang droplang pg_dumpall vacuumdb createuser dropuser pg_id |
and the server programs
|
initdb pg_ctl pg_upgrade postgresql-dump initlocation pg_encoding pg_version postmaster ipcclean pg_passwd postgres |
Each of these programs has a man page which you should get an inkling of.
Further man pages provide references to actual SQL commands. Try man l select (explained further on):
5 10 15 |
SELECT(l) SELECT(l) NAME SELECT - Retrieve rows from a table or view. SYNOPSIS SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] expression [ AS name ] [, ...] [ INTO [ TEMPORARY | TEMP ] [ TABLE ] new_table ] [ FROM table [ alias ] [, ...] ] [ WHERE condition ] [ GROUP BY column [, ...] ] [ HAVING condition [, ...] ] [ { UNION [ ALL ] | INTERSECT | EXCEPT } select ] [ ORDER BY column [ ASC | DESC | USING operator ] [, ...] ] [ FOR UPDATE [ OF class_name [, ...] ] ] LIMIT { count | ALL } [ { OFFSET | , } start ] |
Most important is the enormous amount of HTML documentation that comes with postgres. Point your web browser to /usr/doc/postgresql-? .? .? (or /usr/share/doc/...), then dive into the admin, user, programmer, tutorial, and postgres directories.
Finally, there are the start and stop scripts in /etc/rc.d/init.d/ (or /etc/init.d/) and the directory in which the database tables themselves are stored: /var/lib/pgsql/.
postgres can be gotten prepackaged for your favorite distribution. Simply install the package using rpm or dpkg and then follow the instructions given below.
Stop the postgres server if it is running; the init.d script may be called postgres or postgresql (Debian commands in parentheses):
|
/etc/rc.d/init.d/postgres stop ( /etc/init.d/postgresql stop ) |
Edit the init.d script to support TCP requests. There will be a line like the following to which you can add the -i option. Mine looks like:
|
su -l postgres -c "/usr/bin/pg_ctl -D $PGDATA \ -p /usr/bin/postmaster -o '-i -o -e' start >/dev/null 2>&1" |
which also (with the -o -e option) forces European date formats (28/4/1984 instead of 4/28/1984). Note that hosts will not be able to connect unless you edit your /var/lib/pgsql/data/pg_hba.conf ( /etc/postgresql/pg_hba.conf on Debian) file, and add lines like
|
host mydatabase 192.168.4.7 255.255.255.255 trust |
In either case, you should check this file to ensure that only trusted hosts can connect to your database, or remove the -i option altogether if you are only connecting from the local machine. To a limited extent, you can also limit what users can connect within this file.
It would be nice if the UNIX domain socket that postgres listens on (i.e., /tmp/.s.PGSQL.5432) had permissions 0770 instead of 0777. That way, you could limit connections to only those users belonging to the postgres group. You can add this feature by searching for the C chmod command within src/backend/libpq/pqcomm.c inside the postgres-7.0 sources. Later versions may have added a feature to set the permissions on this socket.
To run postgres, you need a user of that name. If you do not already have one then enter
|
/usr/sbin/useradd postgres |
and restart the server with
|
/etc/rc.d/init.d/postgresql restart |
The postgres init.d script initializes a template database on first run, so you may have to start it twice.
Now you can create your own database. The following example creates a database finance as well as a postgres user finance. It does these creations while being user postgres (this is what the -U option is for). You should run these commands as user root or as user postgres without the -U postgres.
5 |
[root@cericon]# /usr/sbin/useradd finance [root@cericon]# createuser -U postgres --adduser --createdb finance CREATE USER [root@cericon]# createdb -U finance finance CREATE DATABASE [root@cericon]# |
Now that the database exists, you can begin running SQL queries.
5 10 15 20 25 30 35 40 |
[root@cericon]# psql -U finance Welcome to psql, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit finance=# select * from pg_tables; tablename | tableowner | hasindexes | hasrules | hastriggers ----------------+------------+------------+----------+------------- pg_type | postgres | t | f | f pg_attribute | postgres | t | f | f pg_proc | postgres | t | f | f pg_class | postgres | t | f | f pg_group | postgres | t | f | f pg_database | postgres | f | f | f pg_variable | postgres | f | f | f pg_log | postgres | f | f | f pg_xactlock | postgres | f | f | f pg_attrdef | postgres | t | f | f pg_relcheck | postgres | t | f | f pg_trigger | postgres | t | f | f pg_inherits | postgres | t | f | f pg_index | postgres | t | f | f pg_statistic | postgres | t | f | f pg_operator | postgres | t | f | f pg_opclass | postgres | t | f | f pg_am | postgres | t | f | f pg_amop | postgres | t | f | f pg_amproc | postgres | f | f | f pg_language | postgres | t | f | f pg_aggregate | postgres | t | f | f pg_ipl | postgres | f | f | f pg_inheritproc | postgres | f | f | f pg_rewrite | postgres | t | f | f pg_listener | postgres | t | f | f pg_description | postgres | t | f | f pg_shadow | postgres | f | f | t (28 rows) |
The preceeding rows are postgres's internal tables. Some are actual tables, and some are views of tables. [A selective representation of an actual table.]
To get a list of databases, try:
5 |
finance=# select * from pg_database; datname | datdba | encoding | datpath -----------+--------+----------+----------- template1 | 24 | 0 | template1 finance | 26 | 0 | finance (2 rows) |
The following are 99% of the commands you are ever going to use. (Note that all SQL commands require a semicolon at the end--you won't be the first person to ask why nothing happens when you press without the semicolon.)
To create a table called people, with three columns:
|
CREATE TABLE people ( name text, gender bool, address text ); |
The created table will title the columns, name, gender, and address. Columns are typed. This means that only the kind of data that was specified at the time of creation can go in that column. In the case of gender, it can only be true or false for the boolean type, which we will associate to the male and female genders. There is probably no reason to use the boolean value here: using an integer or text field can often be far more descriptive and flexible. In the case of name and address, these can hold anything, since they are of the text type, which is the most encompassing type of all.
Note that in the postgres documentation, a ``column'' is called an ``attribute'' for historical reasons.
You should try to choose types according to the kind of searches you are going to do and not according to the data it holds. Table 38.1 lists the most of the useful types as well as their SQL92 equivalents. The types in bold are to be used in preference to other similar types for greater range or precision:
The SELECT statement is the most widely used statement in SQL. It returns data from tables and can do searches:
|
finance=# SELECT * FROM PEOPLE; name | gender | address ------+--------+--------- (0 rows) |
The ALTER statement changes something:
5 |
finance=# ALTER TABLE people ADD COLUMN phone text; ALTER finance=# SELECT * FROM people; name | gender | address | phone ------+--------+---------+------- (0 rows) |
You cannot drop columns in postgres; you must create a new table from the old table without the column. How to do this will become obvious further on.
Use the DROP command to delete most things:
|
DROP TABLE people; |
Insert a row with (you can continue typing over multiple lines):
|
finance=# INSERT INTO people (name, gender, address, phone) finance-# VALUES ('Paul Sheer', true, 'Earth', '7617224'); INSERT 20280 1 |
The return value is the oid (Object ID) of the row. postgres is an Object Relational database. This term gets thrown around a lot, but it really means that every table has a hidden column called the oid column that stores a unique identity number for each row. The identity number is unique across the entire database. Because it uniquely identifies rows across all tables, you could call the rows ``objects.'' The oid feature is most useful to programmers.
The oid of the above row is 20280. To find it:
5 |
finance=# SELECT * FROM people WHERE oid = 20280; name | gender | address | phone ------------+--------+---------+--------- Paul Sheer | true | Earth | 7617224 (1 row) |
To list selected columns, try:
|
SELECT name, address FROM people; SELECT oid, name FROM people; SELECT oid, * FROM people; |
It should be obvious what these do.
Here we create a new table and fill two of its columns from columns in our original table:
|
finance=# CREATE TABLE sitings (person text, place text, siting text); CREATE finance=# INSERT INTO sitings (person, place) SELECT name, address FROM people; INSERT 20324 1 |
Delete selected rows, like
|
finance=# DELETE FROM people WHERE name = 'Paul Sheer'; DELETE 1 |
About the simplest search you can do with postgres is
|
SELECT * FROM people WHERE name LIKE '%Paul%'; |
Or alternatively, case insensitively and across the address field:
|
SELECT * FROM people WHERE lower(name) LIKE '%paul%' OR lower(address) LIKE '%paul%'; |
The first % is a wildcard that matches any length of text before the Paul, and the final % matches any text after. It is the usual way of searching with a field, instead of trying to get an exact match.
The possibilities are endless:
|
SELECT * FROM people WHERE gender = true AND phone = '8765432'; |
The command
|
COPY people TO '/tmp/people.txt'; |
dumps the people table to /tmp/people.txt, as tab delimeter, newline terminated rows.
The command,
|
COPY people WITH OIDS TO '/tmp/people.txt' DELIMITERS ',' WITH NULL AS '(null)'; |
dumps the people table to /tmp/people.txt, as comma-delimited, newline-terminated rows, with (null) whereever there is supposed to be a zero byte.
Similarly, the command
|
COPY people FROM '/tmp/people.txt'; |
inserts into the table people the rows from /tmp/people.txt. It assumes one line per row and the tab character between each cell.
Note that unprintable characters are escaped with a backslash \ in both output and the interpretation of input.
Hence, it is simple to get data from another database. You just have to work out how to dump it as text.
The command pg_dump <database-name> dumps your entire database as plain text. If you try this on your database, you will notice that the output contains straightforward SQL commands. Your database can be reconstructed from scratch by piping this output through stdin of the psql command. In other words, pg_dump merely produces the exact sequence of SQL commands necessary to reproduce your database.
Sometimes a new version of postgres will switch to a database file format that is incompatible with your previous files. In this case it is prudent to do a pg_dumpall (and carefully save the output) before upgrading. The output of pg_dumpall can once again be fed through stdin of the psql command and contains all the commands necessary to reconstruct all your databases as well as all the data they contain.
When you have some very complicated set of tables in front of you, you are likely to want to merge, select, search, and cross-reference them in innumerable ways to get the information you want out of them.
Being able to efficiently query the database in this way is the true power of SQL, but this is about as far as I am going to go here. The postgres documentation cited above contains details on everything you can do.
University Computer Science majors learn about subjects like Entity Modeling, Relational Algebra, and Database Normalization. These are formal academic methods according to which good databases are designed. You should not venture into constructing any complex database without these methods.
Most university book shops will have academic books that teach formal database theory.