Data Access Technology Survey

The field of data access is large and confusing. So much so that I think it is best if we start with a map to orient ourselves. Table 14-1 lists these properties of many (though by no means all) common data access technologies:

COM+ support. A technology is said to support COM+ if it manufactures resources through a Resource Dispenser. In the case of transactional resources, such as RDBMS connections, an RM must be provided and the Resource Dispenser must perform automatic transaction enlistment.

Provider-specific API. A technology is said to be provider-specific if it does not support data access through an open standard but instead allows access to only a restricted set of data sources through a proprietary interface. ODBC-based open technologies allow access to data that is backed by some type of SQL query engine. OLE DB-based technologies remove even that restriction and allow access to a much larger variety of data sources.

Programming model level. This is the level of abstraction at which the technology offers data access and manipulation to clients. Factors such as exposure of raw handle types, existence of a navigable object model, and exposure of memory buffers play a role in whether an access technology qualifies as high level.

Interface type. An access technology can be accessible only by a certain programming language, or it might offer COM+ interfaces.

COM+ support anchor. Many high-level data access technologies are layered on top of low-level ones. This column lists the level in the hierarchy at which the COM+ support is provided.

Table 14-1 Database Interfaces and Their Fundamental COM+-Related Properties

API or Object Model	COM+ Support	Provider- Specific API	Programming Model Level	COM+ Interface Type	Support Anchor
ODBC	Yes	No	Very low	C	Native
OLE DB	Yes	No (and not SQL specific)	Low	COM+, with pointers	Native
RDO	Yes	No	High	COM+	ODBC
DAO with ODBC- Direct	Yes	No	High	COM+	RDO and thus ODBC)
DAO with Jet work- space	No	Jet-connected and installable ISAM data sources	High	COM+	N/A
ADO	Yes	No (and not SQL specific)	High	COM+ (including Java through WFC)	OLE DB
JDBC (via JDBC- ODBC bridge)	Yes	No	High	Java	ODBC
ATL	Yes	No (and not SQL specific)	High	C++	OLE DB
MFC	Yes	No	High	C++	ODBC
ESQL	No	No	Query language	Language extension through precompilation	N/A
Microsoft SQL Server DB-Library	No	Yes	Very low	C	N/A
Oracle Pro*C	No	Yes	Query language	C language extension through precompilation	N/A
OCI	No	Yes	Very low	C	N/A
OO4O	Yes	Yes	High	COM+	Native

Support that is not provider specific for COM+ is rooted in two technologies: ODBC and OLE DB. I will discuss OLE DB in the next section. ODBC is an older standard for interoperation with relational data stores. I do not know of an RDBMS out there today that does not support ODBC access. When compared to OLE DB, the large number of ODBC-compliant RDBMSs is the advantage of ODBC. However, there is an adapter (referred to as a provider in OLE DB lingo) that converts an ODBC data source to an OLE DB data source. Clients written to the OLE DB specification therefore can take advantage of more data sources than those written to the ODBC specification, at least on Microsoft Windows platforms.

ODBC's driver manager is a Resource Dispenser as of version 3.0 of ODBC. This means that connections to ODBC data sources are pooled and automatically enlisted in transactions if you have version 3 or later of ODBC installed. In order for this to work, some cooperation also is needed from the ODBC driver you are using. In particular, the driver must support the attribute SQL_ATTR_ENLIST_IN_DTC in the function SQLSetConnectionAttr of the driver API.

While the ODBC interface is too low level to allow for productive development and maintenance of modern applications, its support for COM+ is still crucial. There is a lot of source code out there that accesses the data store either directly through ODBC or through a technology layered on top of ODBC. This code exists in various forms, including C function libraries, C++ object hierarchies, and unconfigured legacy COM objects. Because ODBC now offers a Resource Dispenser, all this software can be reused without change and can benefit from transaction enlistment and connection pooling. All you have to do is invoke such legacy software in the context of a transactional COM+ object and voila!—your performance might increase, and the work done by your legacy code will be part of the distributed transaction umbrella with ACID properties orchestrated by your COM+ object.

Technologies layered on top of ODBC include Data Access Objects (DAO), but only when used with ODBC-Direct; Remote Data Objects (RDO); Java Database Connectivity (JDBC); and the MFC classes CDatabase and CRecordset. MFC also offers a set of separate wrapper classes of DAO for C++. Among these second-tier ODBC encapsulation technologies, DAO is the oldest. Microsoft built it originally as a high-level interface for COM clients such as Microsoft Visual Basic to allow access to its Jet desktop database product. Next came RDO, which is a COM object wrapper library around ODBC. With RDO, clients could reach a greater number of local and remote data sources. ODBC-Direct was invented to make this expanded access available to existing DAO clients. In an ODBC-Direct workspace, DAO becomes just a wrapper of RDO. The two MFC ODBC wrapper classes form a very thin veneer on top of ODBC, benefiting C++ desktop applications. An advantage of the MFC approach is that it allows direct access to the ODBC API through the
wrapper classes. It therefore yields the benefit of elevating the programming model without the loss of flexibility. JDBC remains a strong option even today because it is the only fully portable data access solution for Java. However, when portability is not a concern, the more feature-rich ADO available through the Windows Foundation Classes for Java (WFC) is a better choice.

OLE DB has provided a Resource Dispenser since its inception. Even though the OLE DB interface is COM+ based, it is still not high level enough to make for maintainable application code. In addition, OLE DB COM+ interfaces contain constructs that cannot be consumed by all COM+ client development systems, in particular Visual Basic (see Chapter 5), and require that clients provide memory buffers with instructions to OLE DB on how to align data from providers within them. This data access idiom of data to memory location binding is certainly not new; it has existed in the earliest versions of proprietary, low-level C interfaces to relational databases. But this idiom is too error prone for business logic development in any modern programming language. OLE DB lies at the core of all data access in the development of modern applications on Windows platforms, and it is the level at which COM+ integration is provided by a Resource Dispenser. Clients normally interface with a wrapper technology on top of OLE
DB.

Such wrapper technologies include the ActiveX Data Objects for all COM+ clients and the Active Template Library (ATL) OLE DB consumer templates, exclusively for C++. ADO provides a navigable object model through which many (but not all) OLE DB features can be exploited. ATL offers a set of templates in the tradition of generic programming. ATL operates at roughly the same level of abstraction as ADO but nevertheless offers faster access and a number of other significant benefits that are especially important to C++ clients. ATL also allows unrestricted access to the full set of OLE DB functionality. I will discuss specific C++ data access considerations later, in the "C++ Data Access" section of this chapter.

By far the two most popular database products used in COM+ projects are Oracle's RDBMS and Microsoft's SQL Server. Like SQL Server, Oracle offers a native OLE DB provider with version 8i, release 2.^{3 Oracle has implemented this provider directly on top of its low-level Oracle Call Interface (OCI). It now supports updateable and scrollable rowsets, Windows-integrated authentication, access to PL/SQL stored procedures, and rowsets returned from both stored procedures and functions, among other features. Both technologies layered on top of OLE DB—ADO and the ATL consumer templates—allow developers to implement straightforward and maintainable code that can access Oracle databases on Windows and other platforms.}

In addition to the OLE DB provider, Oracle offers Oracle Objects for OLE (OO4O), a set of COM+ objects whose interfaces can be consumed by any client. OO4O has a native Resource Dispenser and is therefore fully integrated with COM+. There are some trade-offs with using OO4O, however. On the plus side, OO4O provides access to Oracle RDBMS-specific features. Oracle also reports better performance with OO4O than the performance you would achieve when going through layers of OLE DB software. On the minus side, fewer developers are familiar with OO4O than, for example, ADO. Therefore, you might need to invest more in training over time. More significantly, OO4O is proprietary technology offering access to Oracle RDBMS products only, both now and in the future. By using OO4O, you lock yourself into using Oracle software. If you are comfortable with that, OO4O is a good option for COM+ projects.

Like all databases, Oracle also offers lower-level interfaces. These include the C-based Pro*C and OCI. Applications theoretically can be written directly against those interfaces. A number of databases, including SQL Server, support the old Embedded SQL (ESQL) standard, a mechanism that embeds SQL statements directly into the source code of a programming language and uses a preprocessor to make those statements compilable in that language. This is exactly how Oracle's Pro*C works, although Pro*C can be used only in C and C++ source code. SQL Server also supports its own low-level flat API, called DB-Library.

All of these data access interfaces are so low level that COM+ integration cannot be provided. No Resource Dispensers exist at that level, and services such as connection pooling, which theoretically could be implemented independent of a COM+ Resource Dispenser, are not offered either. These APIs serve as the lowest level RDBMS interface on which middle-tier services such as COM+ integration are meant to be layered. You might consider these APIs the assembly language of data access. Assembly language is the interface to a processor on which you layer programming languages. You wouldn't consider implementing your business logic in assembly, would you? I discourage the use of any data access technology that does not provide COM+ support in a COM+ project. Avoid especially the low-level interfaces because they tend to degrade source code maintainability by another level of magnitude over that caused by forcing business logic to shoulder the burden of manual transaction management or ACID behavior
simulation. Also, as with any database interface without COM+ support, they force developers to think about controlling database connections efficiently—implementing manual connection management might require substantial effort.

When integrating legacy code into a COM+ project, you will not have much of a choice to make when it comes to selecting a data access method. If the legacy code is ODBC based, you stand a good chance of being able to integrate it with only moderate effort. If the legacy code is not ODBC based, leveraging the benefits of distributed transactions with ACID properties and heterogeneous participants as well as gaining the advantages of centralized resource pooling could be very difficult. But what about brand-new COM+ code? Given all these options for COM+ integration, which one is best?

I suggest these guidelines: First, recognize that the future of data access on Windows platforms rests firmly on the OLE DB foundation. OLE DB (with the help of the ODBC provider) is by far the most universal and widespread data access interface on Windows today. Expect new types of data sources to appear in the Windows market with OLE DB providers first, and perhaps no other interface at all. Second, make a choice that suits the development environment you use. The following are good combinations:

Use ADO in Visual Basic components.

Use ATL OLE DB consumer templates in Visual C++ components. I will discuss conditions for using ADO in C++ and options for doing so later in this chapter, in the "C++ Data Access" section.

Use ADO through WFC in J++ components. Use JDBC if Java code must be portable.

Consider OO4O in any development environment for access to those data items that are tied to Oracle products.

Programming Documents

Sunday, October 18, 2009

Data Access Technology Survey

Data Access Technology Survey

No comments:

Post a Comment

Blog Archive

About Me

Followers

Link