Query splitting & parallel execution

The next release of Livebase will generate applications with better performance on complex queries over large datasets. Here is the story behind this major improvement, and what’s in there for you.

The challenge

One of our customers managing a complex clinical database (with about 60 classes and almost 100 database tables) has modeled the main Patient class with dozens of derived attributes, i.e. attributes retrieving data from other classes through a query.

To make things worse, many of the Patient’s derived attributes are retrieved from classes connected to Patient class only indirectly, via long paths traversing several model associations, including many-to-many ones.

Apparently, our customer enjoyed the possibility to enrich the Patient class with additional derived “query” attributes (such as average blood parameters across exams) by just dragging attributes from other classes of the model onto the Patient class.

As a consequence, the query generated by Livebase for retrieving the patients included an impressive number of chained joins and grouping operators (avg, std, sum, count…). And no Relational DBMS likes this kind of queries, MySQL making no exception.

In fact, despite all the optimization techniques applied and all the indexes created by Livebase on the database, the query generated by the platform was quite slow as soon as the set of patients to retrieve was larger than a few thousands.

The strategy

After some experiments with views and temporary tables, we started working on a technique for breaking down the original query into multiple (simpler) sub-queries that could be executed in parallel more efficiently, and for recombining the results in memory.

The idea behind this technique is to execute a different sub-query to retrieve a subset of the patient’s derived attributes (i.e. a subset of the columns of the original recordset) connected through the same path over the data-model (i.e. retrievable via the same chain of relational joins over the database).

The main problem with this technique is how to split/replicate the original WHERE clause into the sub-queries. What if the original WHERE clause contains a condition based on derived attributes retrieved by different sub-queries?

After two weeks spent analyzing the problem and one week of prototyping, we ended up with a sophisticated algorithm for splitting the original query according to the individual conditions in the WHERE clause, and to the chain of joins required to retrieve each derived attribute.

The results

Retrieving nine thousands patients (with all their derived attributes) took about 40 seconds with the old single-query approach. Splitting the original query into multiple simpler sub-queries and combining their results in memory took about 2 seconds. A staggering 20x improvement.

Although such a dramatic performance improvement can be expected only on applications performing really complex queries, we believe that average Livebase applications with a smaller number of derived attributes will still provide sensibly better performances and better scalability over large datasets.

What’s next

The parallel sub-query technique will be adopted whenever applicable by all the applications generated by the next version of the Livebase platform, which will be released as soon testing will be completed (probably two weeks).

Livebase ver. 2.3.3 released

On July 13, 2011 we released in production the latest stable version of the Livebase platform, delivering a number of improvements to generated applications.

Improved concurrency control on bidirectional associations.

Livebase allows very easily to create applications with many-to-many associations that can be edited on both sides. For instance, you can have an Employee class, a Project class, and a many-to-many association between these classes: if the association is bidirectional, users can concurrently add/remove Projects from any given Employee and also add/remove Employees from any given Project. This means, for instance, that an Employee can be concurrently modified directly (by editing it) and indirectly (by removing it from any of the Projects he’s associated with).

Livebase also allows to define min and max cardinality constraints on both sides of each association (e.g. an Employee can work on no more than three Projects, and a Project must involve at least one and at most ten Employees). Enforcing cardinality constraints efficiently on bidirectional associations, under heavy concurrency conditions, requires a very sophisticated logic.

Upon user’s suggestion, some parts of the application-level locking system (originally designed with a pessimistic approach) have been refactored with an optimistic approach, which should deliver an higher degree of concurrency with a small risk of transactions rejection.

Improved objects-counting query generation

When a user wants to list all the objects of a given class (e.g. all the Orders), Livebase applications count first the number of objects visible to that user according to the database-partitioning rules defined on the specific application (e.g. show only Orders approved in the same month) and also according to the grants associated to the user’s role (e.g. show only Orders issued to the department the user belongs to). The preliminary query used to retrieve this preliminary count from the database is now optimized whenever possible (less joins) in order to be faster.

XML data representation added to the REST APIs

Every application generated by Livebase comes with a complete REST API to access data programmatically (only GET methods are currently supplied, PUT, POST and DELETE methods will be supported by the end of September). Up to now only the JSON format was available with the REST API to represent objects. With this release, XML is also available. The documentation on the web site will be updated as soon as possible with all the technical details on how to use the new XML representation.

New skin for the user interface of generated applications

In the process of creating a skinning framework for the applications generated by Livebase (allowing end-users to select their preferred skin at run-time) we have just released the first skin, which we believe is much nicer than the standard one provided by the GWT framework. Hope you like it!.

Fhoster partecipa all’evento R2B 2011

R2BFhoster parteciperà come espositore al R2B (Research To Business) di Bologna, 8 – 9 giugno 2011.
Saremo al padiglione 33 stand E8. Ogni ora dalle 10.00 alle 17.00 faremo una demo sulla nostra piattaforma Livebase e a tutti i partecipanti distribuiremo un voucher omaggio del valore di 300,00 € da spendere sulla piattaforma. Registrati ora!

Talk al Forum PA

Domani (mercoledì 11 maggio) alle ore 10 del mattino terremo una presentazione  di 50 minuti nella Sala A (anche indicata come Saletta 31) della Piazza dell’Innovazione del Forum PA (padiglione 8).  Chi non riuscisse a partecipare al talk può comunque trovarci al Desk 9 della stessa Piazza dell’Innovazione, dove saremo a disposizione tutto il giorno per demo e approfondimenti.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: