dtl

bulk_copy


Category: functors	Component type: concept

Prototypes

SELECT

template<class ItType, class DataObj> void bulk_copy(ItType &select_iterator, DataObj *begin, DataObj *end, bool bValidate = true, size_t row_size = sizeof(DataObj));

INSERT

template<class ItType, class DataObj> void bulk_copy(DataObj *begin, DataObj *end, ItType &insert_iterator, bool bValidate = true, size_t row_size = sizeof(DataObj));

Description

This algorithm uses bulk ODBC operations to read or write records. If you have many records that you need to read or insert, consider using this routine. The standard DTL iterators operate on one record at a time which is slower than working on many records at once. Bulk operations work on an array of records, saving time in both function calls and database access since the data can be buffered more efficiently. DTL supports bulk inserts and bulk fetches through the use of bulk_copy(). The bValidate parameter specifies whether to call a validation routine when reading or writing records. When bValidate is set to true, SelValidate() is called on each element for bulk selects and InsValidate()is called on each element for bulk inserts. To increase the speed of bulk_copy(), use char arrays or tcstring<N> to represent strings instead of std::string; this is because std::string really hurts performance since it must make many ODBC GetData() and PutData() calls to support arbitrary length strings. DTL provides a special IOHandler for bulk fetches called BulkFetchHandler which records the indexes of which elements read failed validation. NOTE! The bulk insert function uses an ODBC 3.0 feature called row-wise arrays of parameters to provide fast inserts. Not all drivers are fully ODBC 3.0 compliant so some ODBC drivers will fail when you attempt to use this method. For this reason, if your code needs to be portable across multiple driver types you will want to provide some kind of failover logic when you use this method.

Helper Functions

The raw bulk_copy() algorithms that DTL provides come with several inconveniences or restrictions:

The input/output range for the data in memory must always be a contiguous block, restricting the passed in range to be pointers into a C array or a std::vector. This makes the interface of the bulk algorithms not as flexible as standard STL algorithms which always accept iterators of certain categories.
Due to restrictions in ODBC for bulk operations, sizeof(DataObj) must always be greater than both the sizeof(SQLINTEGER) and sizeof(TIMESTAMP_STRUCT). This means the user needs to pad his DataObj with extra bytes at the end of the object for the bulk copy to work. Otherwise, he'll get a surprising exception saying that he needs to pad his object. The bulk_copy() algorithms allow the user to pass in a row size to tell the algorithm how big the object plus padding actually is to aid the processing of the bulk operation, but this doesn't make life much easier for the user. Thus, trying to bulk copy a DBView<int> would be impossible without wrapping the int along with padding in a struct and then bulk copying a view of that wrapper object.

DTL provides helpers both for bulk fetch and bulk insert that resolve both of these problems:

// buffer_size is the number of objects to copy in

// each call to bulk_copy() within the helper

template<class SelectIterator, class OutputIterator> 

   void bulk_fetch_helper(SelectIterator read_it, 

   size_t buffer_size, OutputIterator output);

template<class InputIterator, class InsertIterator> 

   void bulk_insert_helper(InputIterator beg, InputIterator end, 

   size_t buffer_size, InsertIterator ins_it);

The bulk fetch helper writes out to any output iterator while the bulk insert helper reads from any half-open range specified by two input iterators. The helpers also take care of padding the DataObj's with extra bytes internally so that the user doesn't have to clutter up the actual object structure with filler or worry about the padding exception from being thrown. For example, copying from a list of ints to the database is now as simple as creating a DBView<int> and calling bulk_insert_helper():

list<int> int_list;



// ... insert ints into int_list ...



// now write the ints to the DB in bulk

// write to table which has just single column INT_VALUE

// (will copy 100 ints in each internal bulk_copy() call in the helper)

int myInt;

DBView<int> int_view("INT_EXAMPLE",  BCA(myInt, COLS["INT_VALUE"] == myInt)); 

DBView<int>::insert_iterator ins_it(int_view);

bulk_insert_helper(int_list.begin(), int_list.end(), 100, ins_it);

The helpers have a lot of extra overhead than the raw bulk_copy() algorithms because the helpers must call the constructors and destructors manually for each object in the internal buffer it uses for its calls to bulk_copy() and the copying of objects into or out of the buffer into the input/output range passed in to the helper. So if you know you don't need the extra capabilities provided by the helpers, just use the raw bulk_copy() algorithms as they are much more efficient than the helpers.

Definition

Defined in the dtl_algo.h header file with calls appropriately forwarded to bulk_copy() members in select_iterator.h and insert_iterator.h.

Refinement of

None.

Associated types

select_iterator, insert_iterator, BulkFetchHandler

Template parameters

Parameter	Description	Default
`ItType`	The type of the DBView iterator to use for the bulk operation. For a bulk fetch, this type must be a select_iterator and the iterator's value type must be a DataObj. For bulk insert, the iterator type needs to be an insert_iterator and DataObj must be the iterator's value type.
DataObj	The data object type being read or written through the bulk operation. DataObj must not be variant_row.

Preconditions

The range specified by [begin, end) is a contiguous array of DataObj's in memory.
After the first call to bulk_copy() using a particular DataObj range, all future bulk_copy() calls using the same DTL iterator must use a DataObj range containing the same number of elements as in the initial call. Note that the actual range itself can be a different block of memory than passed to the first call.
In general, you must have sizeof(DataObj) > sizeof(SQLINTEGER) * # columns in view. This restriction is in place because of how DTL implements bulk_copy(). If your DataObj doesn't meet this preconditon, pad it with extra bytes at the end of the object.
Similarly, you must also have sizeof(DataObj) > sizeof(TIMESTAMP_STRUCT) * # of dates in view.

Example

// bulk select

Example examples[20]; // buffer used to hold example objects read from DB

DBView<Example> view("DB_EXAMPLE");

DBView<Example>::select_iterator sel_it = view.begin();

bulk_copy(sel_it, &examples[0], &examples[20]); // read 20 Examples

// ... do something with those 20 example objects ...

// ...

bulk_copy(sel_it, &examples[0], &examples[20]); // read 20 more Examples



// bulk insert

Example ins_examples[20]; // example objects to insert into the DB

DBView<Example> view2("DB_EXAMPLE");

DBView<Example>::insert_iterator ins_it = view;

// ... populate ins_examples with some data ...

// ...

bulk_copy(&examples[0], &examples[20], ins_it); // insert 20 Examples

Further Examples

One limitation of the raw bulk_copy algorithm when reading records is that it will read all records into a range regardless of whether they have passed validation or not.

Often what happens when using bulk_copy is that one wants to bulk fetch records into an intermediate buffer, then copy them into a final container if they have passed validation via SelValidate. We have written an algorithm to facilitate this common task called bulk_fetch_helper(ItType select_iterator, size_t buffer_size, OutputIterator output). To use this algorithm, your DBView must use BulkFetchHandler for the IOHandler. Here is an example of how to use this, see dtl_algo.h for details:

// Read the contents of the DB_EXAMPLE table and return a vector of the resulting rows

// Use DTL bulk fetch helper (to validate records along with the read)

vector<Example> ReadData() {

	vector<Example> results;

	typedef DBView<Example> DBV;

 	DBV view(DBV::Args().tables("DB_EXAMPLE")

		.handler(BulkFetchHandler<Example>())

   	);

 	// bulk fetch *all* records into the results container 

   	// use a buffer 128 rows long 

   	bulk_fetch_helper(view.begin(), 128, back_inserter(results));

   

   return results;

}