Saturday, September 20, 2008

Slice : OpenJPA for Distributed Persistence

Slice is now available as an integral part of released version of OpenJPA 1.2.0. Slice extends OpenJPA to transact and query against distributed, horizontally-partitioned, possibly heterogeneous databases. Using OpenJPA's excellent feature derivation framework, Slice offers any existing OpenJPA based application originally developed for single database to transparently upgrade to a database configuration where data is partitioned amongst multiple databases, without any change to the existing application.

Data partitioning is an effective scaling strategy against growing data volume. Many data sets are naturally amenable to partitioning by geographical region (e.g. Homes in each State), temporal interval (e.g. Order in each Month) or by the very nature of application such as multi-tenant, Software-as-Service hosting platforms. As data is distributed across months or states among different databases and Slice executes all critical database operations such as flush and commit in parallel -- the scaling characteristics is determined by size of the maximum database partition instead of the entire data set size. Moreover, Slice supports aggregate query operations such as SUM or MAX -- so that a standard JPQL query such as

select MAX(h.price) from Home h

will issue identical parallel queries across multiple databases, each storing data on Homes in individual state and find maximum of the results of each query and finally return the single maximum value as the result of the query.  

But how about the newly created instances? Which database partition will store a new record? This is, of course, specified by the application itself by implementing a single method of DistributionPolicy interface. The contract is simple: for any new persistent object as input argument, the method should return the name of the database partition. Slice when it encounters a new object during commit will call the user-defined DistributionPolicy implementation and store the new record to the appropriate partition. Slice also tracks the database origin of each persistent instance as they are loaded from different database partitions. So when the application modifies an instances in a transaction and commits -- Slice knows exactly which database partition will receive the update.

2 comments:

Unknown said...

Is it possible to configure slice with a datasource and not a connectionURL?
what could I use put instead of the slice.One.ConnectionURL property.

property name="slice.One.ConnectionURL" value="jdbc:mysql:localhost//slice1"

Pinaki Poddar said...

Hi,
Not tested.

But you can try openjpa.slice.One.ConnectionDriverName=com.acme.MyDataSource(url="xyz")

where org.acme.MyDataSource is fully-qualified name of a class that implements javax.sql.DataSource and supports plugin-style property setUrl(...) etc.