An Enterprise-ready, Simple, Performant, Scalable, Plateform-independant EJB Primary Key Generator

HIGH/LOW Singleton+Session Bean Universal Object ID Generator

Summary:

Well, as Scott Ambler writes in his article "Enterprise-Ready Object IDs" , however you want to have it, most of us who are developing in the EJB world use relational databases to store objects. And where as relational databases need keys, objects don't.

Here is a proposal for an Enterprise-ready, Simple, Performant, Scalable, Plateform-independant EJB Primary Key Generator.

Content:

An Enterprise-ready, Simple, Performant, Scalable, Plateform-independant EJB Primary Key Generator

Forenote: this article refers extensively to Scott Ambler's article titled "Enterprise-Ready Object IDs" and the TheServerSide's pattern "Entity Bean Primary Key Generator". It is strongly advised to take a look at these before reading further. This article has been published in TheServerSide as a separate pattern and has been further discussed.

THE PROBLEM: OBJECT ID... WHY DO I NEED ONE?

Well, as Scott Ambler writes, however you want to have it, most of us who are developing in the EJB world use relational databases to store objects. And where as relational databases need key, objects don't.

Keys that have business meaning are intrinsically a bad idea since the meaning might change, therefore the database representation might need to change, which means... trouble. Hence the need for some meaningless key.

WHY EXISTING SOLUTIONS DO NOT WORK?

We want an Enterprise-ready, simple, performant, scalable, plateform-independant (aka vendor independent) OID generator. So what's wrong with what is already there?

DATABASE CENTRIC SURROGATE KEYS

Incremental keys, when they are supplied by your database, are vendor specific. Their implementation changes from vendor to vendor and therefore using them induces a certain degree of vendor lock-in, which is not always easy to overcome.

UUIDs, GUIDs AND OTHER UNIVERSAL STUFF

The Open Source Foundation's Universal Unique IDentifiers (UUIDs), Microsoft's Globally Unique IDentifiers (GUIDs) and the other Unique ID generator (such as the RMI based UID generator) have some or all of the following drawbacks (refer to Scott Ambler's article for more info):

- They are often predicated on the concept of your application actually being in communication with your database.

- They break down in multiple database scenario.

- Obtaining a key causes a dip in performance.

- There are minor compatibility glitches.

- They are not scalable. Since generation of GUIDs and UUIDs depend on a time stamp with a precision of a thousands of a second, there is a risk of overlap, even minuscule.

SO WHAT WORKS, THEN?

The proposition Scott Ambler made in his article was to use a HIGH-LOW strategy. The strategy consists in a two logical parts: a HIGH value that comes from a source common to all object ID generators and a LOW value your local object ID generator.

The HIGH values are more expensive to retrieve as they have to be fetched from a central source available to all, but are unique to each OID generator. The LOW value is initialised and incremented by the generator itself, locally, which makes it easy and fast to manage and obtain. The concatenation of the HIGH and the LOW makes a unique key.

This provides an enterprise-wide unique object identifier... but not a universal one. Two different companies could use this very method generate their keys, and generate the same keys, unique in their own little world, but duplicated in the real world. To make an Object ID truly Universally Unique, we could add an identifier unique to the company, such as the company domain name.

THE IMPLEMENTATION

The code is available at SourceForge: http://sourceforge.net/projects/ejbutils/. It has been tested on WebSphere 3.5 and it works great. If you want to participate to the development, do not hesitate to drop me a line through SourceForge (click on my name in the project admins box on the top right of the window).

The solution provided here uses a combination of a singleton and a stateless bean:

- the session bean fetches the high key from the database

- the singleton has for responsability: 1) to ask the session bean for the next HIGH value when necessary; 2) to determine the LOW value and the unique identifier; 3) build and return a full UOID (HIGH + LOW + identifier).

The solution has been implemented using the following guidelines:

- Use a key composed of a 112 bits HIGH key, a 16 bits LOW key and a unique enterprise identifier (as per Scott Ambler's article).

- Use a byte array to represent the HIGH and LOW keys. The byte arrays are encapsulated in a Key class that implements various functionalities such as incrementing the key, converting back and forth to String, etc. I think this solution is acceptable performance wise although I did not study the problem thoroughly.

- Store the key as a String in the database (as per Scott Ambler's article). The bytes are converted to a simplified hex format. This is a single source implementation (we avoided the multiple sources implementation and its associated algorithm).

- The UOID generator is not class specific. It generates UOID using the same rules for all classes and does not contain class specific information (as per Scott Ambler's article).

- Create the HIGH key automatically in the database if it is not found.

- Use singleton/factory pattern. This means that there will be one factory per JVM.

[ NOTE: As mentioned in various postings, some EJB servers, such a SilverStream and Gemstone, are creating and destroying JVMs dynamically. I took the same position as Scott Amblers when he says in his article: "Yes, this is wasteful, but when you are dealing with a 112-bit HIGHs, who cares?" If this is really an issue, decisions might need to be taken at the level of such servers' configuration, but this is a separate discussion all together.]

- To avoid Hotspot problems on the database indexes search created by the fact that keys start with a long sequence of identical characters, the key is incremented in reverse order. This is a first try to prevent the problem apprearing at the early stages of life of the database.

This means that instead of incrementing like this:

00000000 00000000 ... 00000000 00000000 -> 00000000 00000000 ... 00000000 00000001

and:

00000000 00000000 ... 00000000 11111111 -> 00000000 00000000 ... 00000001 00000000

the Key class would increment the following way:

00000000 00000000 ... 00000000 00000000 -> 00000001 00000000 ... 00000000 00000000

and:

11111111 00000000 ... 00000000 00000000 -> 00000000 00000001 ... 00000000 00000000

- Looking for a new HIGH key involves retrieving the value from the database, incrementing it and update the row with the incremented value. While performing this we need to have exclusive access to the table row.

Using Entity Bean with serializeable transaction for this is inadequate for several reasons:

1) to garanty exclusive access to the table during a serializeable transaction, your EJB server has to implement pessimistic concurrency control algorithm and some do not (Oracle for example).

2) performance wise, serializeable transactions are costly.

3) getting the next HIGH key looks more like a service than an object (it "provides" the next available key) therefore should really be implemented as a Session Bean;

The only cross-server compliant way to assure exclusive access the table row is to force an exclusive lock on the database using a SELECT FOR UPDATE clause and then use an UPDATE clause to store the incremented value of the key. This solution uses one simple transaction and does not require the Bean transaction to be serializable, which gives it better performance. In fact, since the isolation is taken care of at the database level, the lowest isolation level (read uncommited) is acceptable. Also this solution doesn't perform a lock promotion meaning there is no fear of dead lock.

We only need a stateless Session Bean as there is no state to remember.

To make it highly available we make sure a new transaction is created when using the Session Bean.

This makes the overall solution lightweight and efficient.

To summarise the alternative is:

1) Use a stateless Session Bean .

2) Use a SELECT FOR UPDATE clause followed by an UPDATE clause to get the HIGH key and update it.

3) Set the transaction isolation level to TX_READ_UNCOMMITED and its attribute to TX_REQUIRES_NEW.

-Finally in the implementation, we make sure the AUTOCOMMIT is turned off and then back to its initial setting. This is because an exception is thrown when using a SELECT FOR UPDATE on Oracle 8 database, as kindly mentioned by Weicong Wang. UDB 6.1 beware, tempering with the AUTOCOMMIT causes an exception to be thrown if your isolation level is TX_SERIALIZABLE.

HOW TO USE IT AND HOW DOES IT WORK?

Let's suppose you want your entity bean to take advantage of the UOID generator. Your entity bean will have a field called uoid. It will have to implement the following code in its ejbCreate method:

public void ejbCreate() throws javax.ejb.CreateException{

// get the singleton

UIDDispenser dispenser = UIDDispenser.getDispenser();

try{

uoid = dispenser.getNextId();

} catch (org.ejbutils.uid.UIDDispenserException e) {

throw new javax.ejb.CreateException("Problem with the UIDDispenser : " + e.getMessage());

}

You can have a number of ejbCreate methods to initialise you entity bean differently. Simply make sure that such method call this method first.

The UIDDispenser builts the UOID as follows. First it looks whether it has an assigned HIGH value.

- If it has not, it asks the session bean (here called UIDHighKeyGenerator) to provide it with one. The UIDHighKeyGenerator gets the HIGH value from the database table and returns it to the UIDDispenser (in the process it has incremented the HIGH value of the table all this in a transactionaly safe way). UIDDispenser stores the HIGH value and initialises its LOW value to zero.

- If it has, it checks the its LOW value has not reached its maximum. if it has, the dispenser gets a new HIGH value from the session bean and initialises its LOW value to zero. If it has not, it increments the LOW value.

Finally, the UIDDispenser just has to combine HIGH + LOW + unique identifier and returns the result as a String.

POTENTIAL DRAWBACKS

A uoid is a combination of a 112 bits (or 14 bytes) HIGH value, a 16 bits (or 2 bytes) LOW value and a unique identifier. The string representation of a byte used here is made of 2 chars, making the uoid a 32+<length to the unique identifier> long string. Some DB administrator objected to that. I have not studied the problem thouroughly. You should probably check whether this is really a problem and study the trade-offs.

CONCLUSION

This provides us with a Entreprise-ready, single performant, scalable, plateform/vendor independent UOID generator.

I shall insist on the "Entreprise-ready" nature of this solution. If you need a uoid in your application but you do not need it to be unique in the whole world and 128 bits long so that you never see the end of it, you still can use the implementation provided here. for the first, you just change the code to not add the unique identifier to the uoid. For the second just modify the HIGH_KEY_BYTES or LOW_KEY_BYTES (not recommended for this one) static fields in the UIDDispenser to the value you feel appropriate. Forthcoming version of this implementation will have options to allow you to do all this without modifying the code. Just be carefull in your decision: you do not want another Y2K disaster for you application.

Emmanuel SCIARA