dtl
Here is a short list of tips to get the best performance out of DTL.
1. In your classes use a fixed length string such as char[x] or tcstring<x> to read or write character data for large data sets.
This allows dtl to bind the data directly to your class. If you use std::string in your
class then DTL assumes that the string might be arbitrarily long and calls SQLGetData / SQLPutData to read or write
the information which is slower than directly binding a fixed length string.
2. If you are reading a large number of records or writing a large number of records at once then use the
bulk_copy or bulk_copy_helper methods as these can use ODBC bulk operations to read/write the data which is
much faster than working with one record at a time. This tip will give you the biggest improvement for insert operations.
Bulk fetch is about 10% faster than a fetch done one row at a time, but bulk insert is often more than twice as fast as
inserts that are done one row at a time.
3. If you are going to be executing a SQL statement multiple times, then try to re-use that iterator (and associated DBView).
Re-using an existing iterator to re-execute an existing statement is much faster than re-creating the statement
every single time.
In order to get an understanding of how much overhead is incurred in working with a DBView we ran some tests which compare a DBView::select_iterator to a select statement using the raw ODBC API :
// Read the contents of the DB_EXAMPLE table and return a vector of the
// resulting rows
// Use DTL
vector<Example> ReadData()
{
vector<Example> results;
DBView<Example> view("DB_EXAMPLE");
DBView<Example>::select_iterator read_it = view.begin();
DBView<Example>::select_iterator end_it = view.end();
for ( ; read_it != end_it; ++read_it)
{
results.push_back(*read_it);
}
return results;
}
// Read the contents of the DB_EXAMPLE table and return a vector of the
// resulting rows
// Use raw ODBC API
vector<Example> RawReadData()
{
vector<Example> results;
Example row;
SQLHENV V_OD_Env; // Handle ODBC environment
RETCODE V_OD_erg; // result of functions
SQLHDBC V_OD_hdbc; // Handle connection
SQLHSTMT V_OD_hstmt;
SQLINTEGER V_OD_status[10]; // Status SQL, NULL or not
char string_buffer[200];
TIMESTAMP_STRUCT timestamp_buffer;
V_OD_Env = DBConnection::GetDefaultConnection().GetHENV();
V_OD_hdbc = DBConnection::GetDefaultConnection().GetHDBC();
V_OD_erg=SQLAllocHandle(SQL_HANDLE_STMT, V_OD_hdbc, &V_OD_hstmt);
if ((V_OD_erg != SQL_SUCCESS) && (V_OD_erg != SQL_SUCCESS_WITH_INFO))
{
cout << "Error. Unable to alloc handle" << endl;
exit(0);
}
SQLBindCol(V_OD_hstmt,1,SQL_C_SSHORT, &row.exampleInt, 0, &V_OD_status[0]);
SQLBindCol(V_OD_hstmt,2,SQL_C_CHAR, string_buffer, sizeof(string_buffer), &V_OD_status[1]);
SQLBindCol(V_OD_hstmt,3,SQL_C_DOUBLE, &row.exampleDouble, 0, &V_OD_status[2]);
SQLBindCol(V_OD_hstmt,4,SQL_C_SLONG, &row.exampleLong, 0, &V_OD_status[3]);
SQLBindCol(V_OD_hstmt,5,SQL_C_TIMESTAMP, ×tamp_buffer, 0, &V_OD_status[4]);
V_OD_erg=SQLExecDirect(V_OD_hstmt,(unsigned char *)"SELECT INT_VALUE, STRING_VALUE, DOUBLE_VALUE, EXAMPLE_LONG, "
"EXAMPLE_DATE FROM DB_EXAMPLE",SQL_NTS);
if ((V_OD_erg != SQL_SUCCESS) && (V_OD_erg != SQL_SUCCESS_WITH_INFO))
{
cout << "Error. Unable to execute statement" << endl;
exit(0);
}
while(SQLFetch(V_OD_hstmt) != SQL_NO_DATA)
{
row.exampleStr = string_buffer;
row.exampleDate = timestamp_buffer;
results.push_back(row);
}
SQLFreeHandle(SQL_HANDLE_STMT,V_OD_hstmt);
return results;
}
void test_read() {
// read initial data from table
_ftime( &start );
examples = RawReadDataDummy(); // RawReadDataDummy() is the same as RawReadData() -- see below for an explanation
_ftime( &finish );
cout << "Dummy: " << finish.time*1000+finish.millitm
- start.time*1000 - start.millitm << endl;
// read data from table using ODBC API
cout << "Examples read from database using raw ODBC API:" << endl;
_ftime( &start );
for (i = 0; i < 10; i++)
examples = RawReadData();
_ftime( &finish );
cout << "Raw: " << finish.time*1000+finish.millitm
- start.time*1000 - start.millitm << endl;
// read data from table using DBView
cout << "Examples read from database using DBView:" << endl;
_ftime( &start );
for (i = 0; i < 10; i++)
examples = ReadData();
_ftime( &finish );
cout << "Read Data: " << finish.time*1000+finish.millitm
- start.time*1000 - start.millitm << endl;
}
In running these tests you will notice that there is a call to a function called DummyRawReadData. The reason for this is that when we first ran our tests versus MS SQL Server 7.0, we noticed that whichever routine was called first would always run more slowly. It turns out that the way the SQL Server 7 ODBC driver works, when the first statement is executed the SQL is parsed and all 32 records from the DB_EXAMPLE table are stored into a buffer. On our test machine, the population of this buffer takes about 150ms, wheras transferring the contents of the buffer to a local vector requires about 2ms. Furthermore any subsequent ODBC call which executes the same SQL select statement is able to simply reuse the driver's data buffer. So, to in order to not distort perfomance metrics we measure the cost of this first fetch separately by a routine called RawReadDataDummy.
Test results from running the program four times using a 333MHZ Pentium computer with SQLServer 7.0 and the above code. All times shown in milliseconds as output:
RawReadDataDummy (called once) | 160 | 150 | 160 | 190 |
RawReadData (called x10) | 18.1 | 19.1 | 18.0 | 19.0 |
ReadData (called x10) | 68.1 | 67.1 | 68.1 | 68.1 |
This gave us a feel for the overall performance difference between the two methods. The next question we asked was how much of this performance difference was a constant factor (initialization of the DBView + iterator) and how much of this difference was a variable overhead (i.e. a cost per row of data read). To answer this question we ran the code through Numega TrueTime (TM) which was able to measure the amount of time spent on a line by line basis in the functions above. Doing the math with this profiling tool led to the result that 72.4% of the function time for RawReadData was spent in the fetch loop whereas 56.3% of the function time for ReadData was in the fetch loop. This gives us:
Function | Fetch loop time in ms | Time outside fetch loop / fixed overhead |
RawReadData | 18.5 * .724 = 13.40 | 18.5 * .276 = 5.11 |
ReadData | 68.1 * .563 = 38.34 | 68.1 * .437 = 29.76 |
Therefore if we look at the fetch loop the overhead incurred by the ReadData routine is a total 24.94ms. Since ReadData was called 10 times, and the total number of rows fetched was 32, this gives a per row overhead of 0.08ms / row when using DBView versus a the raw ODBC API calls shown above. Similarly, we can compute the fixed / construction time overhead of using a DBView as 2.46ms.
There is a performance price to be paid for using the DBView code. However, the per row overhead is relatively small, on the order of 0.08ms / row on our 333MHZ test machine. Most of the overhead in using a DBView comes from the constructor calls associated with creating a DBView and a select iterator. The DBView construction overhead is on the order of 2.46ms. This construction overhead is relatively small compared to the time taken to parse a query and read rows from the database which is on the order of 150ms in RawReadDataDummy above. So, the most significant overhead is the construction cost for a DBView, but for the majority of applications this will not be noticeable as it is dwarfed by the time taken to process the underlying query against the database.
Copyright © 2002, Michael Gradman and Corwin Joy.
Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Corwin Joy and Michael Gradman make no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.