kinetic Resources logo  
  MOREnet logo
kinetic Home | kinetic Manager | E-mail Services | Web Site Management | USENET News | Modem Pools
 Home >  Web Site Management >  Java Application Server >  MySQL Database >  MySQL Best Practices
     
MySQL Database  
     
  MySQL Best Practices  
     

View the printer friendly version of this document

MySQL Best Practices

Revised: February 13, 2003

Important: The Java application server and MySQL database are not included as part of the kinetic Service annual fee. These technologies are available on an as-requested basis for an additional fee. Contact David Drum at david@more.net or (800) 509-6673 for additional information.

Contents

Overview

MySQL best practices involves the use of proven principles for getting the highest performance and security from a MySQL database. Applying the concepts discussed here can have a dramatic impact of the overall speed and security of a web application. Moreover, careful consideration to these principles can help ensure a web application is highly robust and runs trouble free consuming the least amount of computing resources while in service.

Optimizing Queries

Optimizations involving queries are at the heart of getting excellent performance from a MySQL database. This process begins with creating a realistic data set for testing the SQL that is executed by the database. Data is then collected from this test environment that can be analyzed to pinpoint problems either unique to specific queries or related to table structure. Once changes are made data can be collected again to ensure each optimization has achieved the maximum performance out of the system. The following topics discuss this process in greater detail.

Configure a test data set

Configure your development MySQL database with a full data set that realistically reflects the data that will exist when the application is put into production. Consider both the variation of the data values (e.g. extreme values) as well as the expected number of records.

Collect test data

The MySQL Java JDBC driver is capable of profiling the SQL that is executed. Queries, including execution/fetch times, are recorded to Standard Error. Profiling is enabled by editing the Tomcat server.xml configuration file. Modify the database connect string of the JNDI named JDBC Datasource to include the profileSql property. The following snipit from server.xml illustrates enabling profiling (note that {db_name} will need to be replaced with the name of your MySQL database):
<parameter>
    <name>url</name>
    <value>jdbc:mysql://localhost/{db_name}?autoReconnect=true&amp;profileSql=true</value>
</parameter>

Use EXPLAIN to analyze SQL statements

Sometimes the indexes that are needed for a table are not always obvious. Other conditions of a SQL query may cause MySQL to perform additional work. The EXPLAIN facility can be very helpful in identifying additional indexes that may be needed.

Each query should be examined separately for performance. The use of MySQL's EXPLAIN facility can be used to get information about SELECT statements. An EXPLAIN analysis shows, for example, possible columns that should be indexed and the the order in which MySQL will join tables in complex queries. Many other additional parameters are also displayed by EXPLAIN that assist in understanding how MySQL will resolve the query.

The "Extra", "Key", and "Type" columns of the output generated by EXPLAIN are important to watch. In the "Extra" column, a listing of 'Using Filesort' indicates that MySQL had to make an extra pass through the data, storing the resulting keys and pointers, then re-retrieve the data in sorted order (usually using a generated temp file). This can potentially be very demanding on system resources and increase the time needed to deliver the query's results.

Additionally, the "Key" column indicates which index MySQL used while performing the query. This is good, as it means the query was less demanding of system resources and could be delivered more quickly. A value of 'NULL' means either MySQL did not have an index to use or it could not use a given index for that particular query (e.g. a multi-table query). Consider adding an index if possible or specifically telling MySQL to use a specific index.

If more than one table is being accessed by the query, the "Type" column becomes important also. The "Type" column will indicate the type of join that MySQL can do when querying across the multiple tables. If the value in the "Type" column is 'All' (i.e. a full table scan), then the query in question can potentially get expensive (high processing time), especially if it is listing this in a row beyond the first table.

Lastly, if the value for "Key" is 'NULL' and the value for "Type" is 'ALL' in a multi-table query, there is a potential for a very expensive query in both processing time and temporary file usage, magnified by the number of tables to be gone through. In this situation, consider breaking up the query into smaller interim results. Consider simplifying the SQL by moving business logic performed by the query into the application (e.g. separate queries vs. a large single query with many joins).

Use complex SQL statements with caution

It is not always necessary to perform one single monolithic query to create the entire end result set.
Consider if there is a point where a smaller result set from one query can be used to build the next (also smaller) result set, with the end result being equal to the larger (and more expensive) single-query result set. In certain cases, you can strike a balance between code processing and query processing of data that results in an over-all faster and more efficient processing.

Ensure all SQL limits the number of records returned

Tables can sometimes grow to excessively large numbers of rows. When designing SQL queries ensure constraints exist that will always limit the number of records returned to a reasonable subset of the total number of records stored in a table. Limits ensure reliable performance and also tend to improve the usability of a web application.

Use LAST_INSERT_ID() to obtain unique Ids

If you need to retrieve the value of the unique ID from the last INSERT statement into a table that has an AUTO_INCREMENT attribute, use the LAST_INSERT_ID() function. The following example uses this function to insert a record into a second table referencing the unique key generated by the first INSERT:

INSERT INTO table1( auto,text ) VALUES( NULL, 'text' );
INSERT INTO table2( id, text ) VALUES( LAST_INSERT_ID(), 'text' );
The most recently generated ID is maintained by MySQL on a per-connection basis and is, therefore, thread safe.

Table Indexes are the key to fast queries

Indexes are key to ensuring fast and efficient queries. In fact, it cannot be emphasized enough how important the use of table indexes can be to the response time of a database query. Without indexed tables, MySQL will be forced to perform a full table scan, reading each row in the table, as it works to retrieve the requested records.

Additional work will need to be performed by MySQL anytime a query uses the ORDER BY or GROUP BY keywords. For these queries, indexes play an even more important role. When you want to use an ORDER BY or GROUP BY on a given field for queries, be sure to set an index for that field. If you will be grouping or ordering by multiple fields, create an index which combines those fields. Be aware that MySQL will not be able to utilize an index with ORDER BY or GROUP BY if the field used to select the records (WHERE = field1) is not the same as the field by which the records are ordered or grouped (GROUP BY field2).

Preventing SQL Spoofing

SQL spoofing occurs when users can enter values which alter the intent of the SQL query your programs issue.

Consider a web form that allows the user to specify the Id number of a data record to search for. The id number is supposed to be a number, and if the user types in "137" the program issues a select statement to find the record with id = "137".

String qstr = "SELECT * FROM table1 WHERE id = " + usersInput;
would result in the following query...
"SELECT * FROM table1 WHERE id = 137"
Unfortunately, a malicious user could instead type "137 or TRUE" resulting in the following query...
"SELECT * FROM table1 WHERE id = 137 OR TRUE"
The problem here is that TRUE always evaluates to TRUE for all records evaluated by the query. The result of this query would be a listing of all records in the table!

The next example illustrates the extent of damage that can occur when an UPDATE statement is used to modify the phone number of the record associated with this id.

String qstr = "UPDATE table1 SET phone = '" + usersInput2 + "' WHERE id = " + usersInput;
would result in the following update query...
"UPDATE table1 SET phone = '555-1234' WHERE id = 137 OR TRUE"
The result of this would be that all records in the table would have their "phone" field set to "555-1234"!

Proper quoting of user entered values is key to preventing SQL spoofing. If, in the above examples, the value entered by the malicious user had been properly quoted (i.e.- id = "137 OR TRUE" ), the problem would have been avoided. To ensure that values are properly quoted, make use of Java Prepared Statements. Prepared Statements automatically evaluate and quote the values of variables so that they do not alter the intent of the SQL query. The following example illustrates the use of a Prepared Statement using the above SELECT query:

PreparedStatement pstmt = db_conn.prepareStatement( "SELECT * FROM table1 WHERE id = ?");
pstmt.setInt(1,137);

Table Locking

Normally, you should not need to lock tables. MySQL treats all single UPDATE statements as self-contained transactions and, therefore, no other UPDATE or INSERT can interfere with a currently executing UPDATE or INSERT. MySQL Table Locking Issues may cause other user's queries to be delayed until the locked table is unlocked. In some cases, this can result in timeouts to their client browser. Locking tables, if used at all, is most applicable to situations such as late night bulk maintenance/updates which occur across multiple tables. Especially, when data interdependencies exist between the set of tables.

Table Types

The latest distributions of MySQL use the MYISAM table type by default. This table type is a greatly improved implementation of the original ISAM table type. The original ISAM table type is now deprecated and will disappear at some point in the future and it should not be used.

Database Connection Pooling/Object Caching

Database connection pooling improves the performance of applications which use a database by maintaining a pool of database connections and reusing them. Database object caching is a middleware technology used to persist and cache objects from a database and can also significantly improve performance. These technologies are more fully discussed in the Web Application Design Guidelines document.

Essential Reading For MySQL Best Practices from the MySQL Manual

The entire MySQL manual can be downloaded in several formats and is the definitive source for optimizing MySQL databases. The following chapters provide deeper insight into some of the more important topics of this discussion.