Adhoc makes it easy to define/code/review a la Excel formulas in the context of OLAP queries. It is compatible with a wide range of databases and query engines.
- The formulas define a DAG/Directed-Acyclic-Graph, going from abstract measures to aggregated measures. Intermediate formulas could also be used as measures.
- The DAG should be easily readable and modifiable by a human, not necessarily a developer.
- The DAG can express simple operations like
SUMorPRODUCT, and complex operations likeGROUP BYor any custom logic. - Aggregated measures are evaluation by external databased (e.g.
GROUP BYqueries in SQL databases).
Most measures, including intermediate measures, hold a functional meaning. Hence, Adhoc can be seen as a semantic layer, enabling complex formulas over a snowflake SQL/!SQL schema, or a composite of snowflake schemas.
This repository includes 2 products:
- Adhoc, which is the core project as it holds the query-engine. It is a plain Java library (e.g. no API, no web-server).
- Pivotable, which is a referential web-application around Adhoc. It enables APIs (over Spring WebFlux) and a Single-Page-Application (with VueJS).
- Excel Formulas. While data are externalized from Adhoc, the tree of measures can be seen as cells referencing themselves through formulas.
- SQLServer Analysis Services Measures. We rely on many concepts from SQLServer to define our own abstractions.
- Apache Beam. Though Beam seems less flexible to access intermediate results as intermediate measures.
- MongoDB Aggregation Pipeline.
- DAX enables complex queries in Microsoft eco-system.
- SquashQL is an SQL query-engine for OLAP, with a strong emphasis on its typescript UI.
- Atoti PostProcessors is the standard Atoti way of building complex tree of measures on top of Atoti cubes.
- Atoti DirectQuery enables Atoti to query an external Database instead of querying in own InMemory cubes.
- Drill: much stronger to be queried in SQL by BI Tools. Unclear strategy to define a large tree of measures.
- Cube.js, but ability to define a large tree of complex measure may not be sustainable.
- DBT Metrics, but ability to define a large tree of complex measure may not be sustainable.
- Polars enables data-processing on a single machine.
RAM: any JVM can run Adhoc, as Adhoc does not store data: it queries on-the-fly the underlying/external tables. CPU: any JVM can run Adhoc. If multiple cores are available, Adhoc will takes advantage of them. But even a single-core JVM can run Adhoc queries smoothly.
- Ensure you have JDK 21 available
- Add a (
maven/gradle) dependency toeu.solven.adhoc:adhoc:0.0.2(they are deployed to m2central: https://central.sonatype.com/artifact/eu.solven.adhoc/adhoc) - Define an
ITableWrapper: it defines how Adhoc can access your data
Assuming your data is queryable with JooQ:
myTableWrapper = new JooqTableWrapper(JooqTableWrapperParameters.builder()
.dslSupplier(DuckDBHelper.inMemoryDSLSupplier())
.table(yourJooqTableLike)
.build());
For local .parquet files, it can be done with:
myTableWrapper = new JooqTableWrapper(JooqTableWrapperParameters.builder()
.dslSupplier(DuckDBHelper.inMemoryDSLSupplier())
.table(DSL.table(DSL.unquotedName("read_parquet('myRootFolder/2025-*-BaseFacts_*.parquet', union_by_name=True)")))
.build());
- Define a
MeasureForest: it defines the measures and the links between them, through their underlying measures.
An early-stage forest could look like:
Aggregator k1Sum = Aggregator.builder().name("k1").aggregationKey(SumAggregator.KEY).build();
Aggregator k2Sum = Aggregator.builder().name("k2").aggregationKey(SumAggregator.KEY).build();
Combinator k1PlusK2AsExpr = Combinator.builder()
.name("k1PlusK2AsExpr")
.underlyings(Arrays.asList("k1", "k2"))
.combinationKey(ExpressionCombination.KEY)
.combinationOptions(ImmutableMap.<String, Object>builder().put("expression", "IF(k1 == null, 0, k1) + IF(k2 == null, 0, k2)").build())
.build();
MeasureForest.MeasureForestBuilder forestBuilder = MeasureForest.builder();
forestBuilder.addMeasure(k1Sum);
forestBuilder.addMeasure(k2Sum);
forestBuilder.addMeasure(k1PlusK2AsExpr);
- Defines an Adhoc Engine: it know how to execute a query given the measure relationships
CubeQueryEngine engine = CubeQueryEngine.builder().eventBus(AdhocTestHelper.eventBus()).forest(forestBuilder.build()).build();
- Define your query
ITabularView view = engine.execute(CubeQuery.builder().measure(k1SumSquared.getName()).debug(true).build(), jooqDb);
MapBasedTabularView mapBased = MapBasedTabularView.load(view);
Assertions.assertThat(mapBased.keySet().map(SliceAsMap::getCoordinates).toList())
.containsExactly(Map.of());
Assertions.assertThat(mapBased.getCoordinatesToValues())
.containsEntry(Map.of(), Map.of(k1SumSquared.getName(), (long) Math.pow(123 + 234, 2)));
- Execute your query
flowchart TD
Pivotable --> ICubeWrapper
ICubeWrapper --> IMeasureForest
ICubeWrapper --> ITableWrapper
ICubeWrapper --> ICubeQueryEngine
ICubeWrapper --> IQueryPreparator
IQueryPreparator --> IQueryStepCache
IQueryPreparator --> IImplicitFilter
IQueryPreparator --> IImplicitOptions
IQueryPreparator --> ExecutorService
ICubeQueryEngine --> ITableQueryEngine
ITableQueryEngine --> ITableWrapper
ITableQueryEngine --> ITableQueryOptimizer
ITableWrapper --> DuckDB
ITableWrapper --> RedShift
ITableWrapper --> BigQuery
flowchart TD
CubeQuery --> CubeQueryStep_userMeasure
CubeQueryStep_userMeasure --> CubeQueryStep_underlyingMeasure
CubeQueryStep_underlyingMeasure --> CubeQueryStep_aggregator
CubeQueryStep_aggregator --> SplitTableQueries
SplitTableQueries --> TableQueryV2
CubeQuery --> CubeQueryStep_userMeasureis mostly managed byCubeQueryEngine.getRootMeasures(...)CubeQueryStep_userMeasure --> CubeQueryStep_underlyingMeasureis mostly managed byCubeQueryEngine.makeQueryStepsDag(...).CubeQueryStep_aggregator --> SplitTableQueriesis mostly managed inITableQueryOptimizer.splitInduced(...). It build a DAG ofAggregator-CubeQueryStep. The roots of this DAG areinducers, able to inferinduced.SplitTableQueries --> TableQueryV2is mostly managed inITableQueryOptimizer.packStepsIntoTableQueries(...). It definesTableQueryV2able to infer theinducers.`
An CubeQuery is similar to a SELECT ... WHERE ... GROUP BY ... SQL statement. It is defined by:
- a list of
groupBycolumns. - a set of
filterclauses. - a list of measures, being either aggregated or transformed measures.
graph TB
sum -- haircut --> delta
sum -- haircut --> gamma
sum.FR -- country=France --> sum
ratio.FR --> sum.FR
ratio.FR --> sum
Adhoc is not a database, it is a query engine. It knows how to execute complex KPI queries, typically defined as complex graph of logics. The leaves of these graphes are pre-aggregated measures, to be provided by external tables.
Typical tables are:
- CSV or Parquet files: Adhoc recommends querying local/remote CSV/Parquet files through DuckDb, with the JooqTableWrapper.
- Any SQL table: you should rely on JooqTableWrapper, possibly requiring a Professional or Enterprise JooQ license.
- ActivePivot/Atoti
- Your own Database implementing
ITableWrapper
Several different kind of IAdhocColumn:
ReferencedColumnare standard columns, as provided byITableWrapper(e.g. some column from some SQL table).EvaluatedExpressionColumnare columns evaluated given some expression by theITableWrapper(e.g. an SQL likec AS a || '-' || b)ICalculatedColumnare evaluated by thecube, given underlying columns. (e.g. anEvaluatedExpressionColumnlikea + '-' + b)IColumnGeneratorare evaluated by thecube, providing additional columns and members, independently of other columns. They are typically generated by measures (e.g. a many-to-many measure), and suppressed before reaching theITableWrapper.
The values taken by a column are named coordinates. In similar context, they may be referred to members (e.g. in Analysis Services hierarchies).
Given tables may hold similar data but with different column names. A ITableWrapper enables coding once per table such a mapping.
A default ITableWrapper assumes ICubeQuery columns matches the ITableWrapper columns.
In case of a table with JOINs, one would often encounter ambiguities when querying a field. For instance when:
- querying a field used in a JOIN definition: the same name may appear in multiple tables
- querying joined tables with
*, but tables have conflicting field names.
In such a case, one can resolve ambiguities by resolving them in a ITableWrapper. For instance:
MapTableAliaser.builder()
.queriedToUnderlying("someColumn", "someTable.someColumn")
.build()Tables may not all accept query with similar types. Typically, one may filter a column with an enum while given enum type
may be unknown to the table.
This can be managed with a ICustomTypeManager, which will handle type-transcoding on a per-column per-value basis.
A measure can be:
- an aggregated measure (a column aggregated by an aggregation function)
- an transformed measure (one or multiple measures are mixed together, possibly with additional
filterand/orgroupBys).
A set of measures defines a Directed-Acyclic-Graph, where leaves are pre-aggregated measures and nodes are transformed measures. The DAG is typically evaluated on a per-query basis, as the CubeQuery groupBy and filter has to be combined with the own measures groupBys and filters.
Measures are evaluated for a slice, defined by the groupBy and the filter of its parent node. The root node have they groupBy and filter defined by the CubeQuery.
- Combinator neither change the
groupBynor thefilter. - Filtrator adds a
filter, AND-ed with node ownfilter. - Partitionor adds a
groupBy, UNION-ed with node owngroupBy.
Aggregations are used to reduce input data up to the requested (by groupBys) granularity. Multiple aggregation functions may be applied over the same column.
See https://support.microsoft.com/en-us/office/aggregate-function-43b9278e-6aa7-4f17-92b6-e19993fa26df
ExpressionAggregation enable custom expression for table processing.
For instance, in DuckDB, one can use the syntax SUM("v") FILTER color = 'red'. It can be used as aggregator:
Aggregator.builder()
.name("v_RED")
.aggregationKey(ExpressionAggregation.KEY)
.column("max(\"v\") FILTER(\"color\" in ('red'))")
.build();
On top of aggregated-measures, one can define transformators.
- Combinator: the simplest transformation evaluate a formula over underlying measures. (e.g.
sumMeasure=a+b). - Filtrator: evaluate underlying measure with a coordinate when the filter is enforced. The node
filteris AND-ed with themeasurefilter. Hence, if the query filterscountry=Franceand the filtrator filterscountry=Germany, then the result is empty. - Partitionor: evaluates the underlying measures with an additional groupBy, then aggregates up to the node granularity.
- Dispatchor: given an cell, it will contribute into multiple cells. Useful for
many-to-manyorrebucketing.
In Analysis-Services, Many-to-Many is a feature enabling a fact (i.e. an input row) to contribute into multiple coordinate of a given column.
For instance, in a flatten GeographicalZone column (e.g. having flattened a hierarchical Region->Country->City), a single Paris fact would contribute into Paris, France and Europe.
This can be achieved in Adhoc with a Dispatchor.
A full example is visible in TestManyToManyCubeQuery. It demonstrates how a measure can:
- query underlying measures on fine grained slices (e.g. the input slices of the many-to-many)
- project each of these slices into 0, 1 or N slices (e.g. the output slices of the many-to-many)
- generate its own columns, independently of the underlying table (
IColumnGenerator). In a many-to-many, the group column is generated by the measure, while the elements are generally generated by the table (or a previous many-to-many).
references:
For for needs, the aggregation applies not over doubles or longs, but complex objects. For Value-at-Risk, the aggregated object is a double[] of constant length.
An example for such a use-case is demonstrated in TestTableQuery_DuckDb_VaR.
The ability to decompose along the scenarioIndex, or a scenarioName column, which are not defined in the table (which should provide one double[] per row) is achieved with a Dispatchor. Indeed, it can be seen as a 1-row-to-many-scenarioIndexes.
ints are generally treated as longs.
- Aggregations (e.g.
SUM) will automatically turnsintintolong EqualsMatcher,InMatcherandComparingMatcherwill automatically turnsintintolong
floats are generally treated as doubles.
SUMwill automatically turnsfloatintodoubleEqualsMatcher,InMatcherandComparingMatcherwill automatically turnsfloatintodouble
- Aggregations should generally aggregates as
long, elsedouble.
Sometimes, one wants to filter the visible members along some columns, without filtering the actual query. Typically, one may want
to query the ratio France/Europe by filtering the France country, without restricting Europe to France-only. For now, this can not be easily done.
The underlying issue is that one mah have a column filtering Country-with_firstLetterIsForG. Assuming we have a measure returning currentCountry/Europe
where currentCountry is the country on the Country column, if we filter Country-with_firstLetterIsForG=true in the query, should we show
France/(France+Germany) or France/Europe?
- We may introduce a special
groupBy, where we would express we groupBycountrybut only showingCountry-with_firstLetterIsForG=true - We may introduce a special
filter, stating thatCountry-with_firstLetterIsForG=trueis aVisualfilter. It resonates with https://learn.microsoft.com/en-us/sql/mdx/visualtotals-mdx?view=sql-server-ver16
Typical errors with Adhoc are :
- Issue referencing underlying Table columns
- Unexpected data returned by underlying Table
Tools to investigate these issues are:
- Enable
debugin your query:CubeQuery.builder()[...].debug(true).build() - Enable
explainin your query:CubeQuery.builder()[...].explain(true).build()
ForestAsGraphvizDag can be used to generate GraphViz .dot files given an IMeasureForest.
StandardQueryOptions.DEBUG will enable various additional [DEBUG] logs with INFO logLevel. It may also conduct additional operations (like executing some sanity checks), or enforcing some ordering to facilitate some investigations. It may lead to very poor performances.
StandardQueryOptions.EXPLAIN will provide additional information about the on-going query. It will typically log the query executed to the underlying table.
At the bottom of the DAG of cubeQuerySteps, the measures are measures evaluated by an external table, applying aggregation functions for given GROUP BY and WHERE clauses.
Here is an example of such a DAG:
graph TB
cubeQuery[kpiA, kpiB on X by L]
subgraph user query
cubeQuery
end
cubeQuery --> measureA_cubeContext
cubeQuery --> measureE_cubeContext
subgraph cube DAG
measureA_cubeContext[kpiA on Xby L]
measureB_cubeContext[kpiB on X by L]
measureC_cubeContext_v2[kpiC on X by L&M]
measureD_cubeContext_v3[kpiD on X by L]
measureE_cubeContext[kpiD on Y by L]
measureF_cubeContext_v3[kpiD on X&Y by L&M]
end
measureA_cubeContext --> measureB_cubeContext
measureB_cubeContext --> measureC_cubeContext_v2
measureB_cubeContext --> measureD_cubeContext_v3
measureE_cubeContext --> measureC_cubeContext_v2
measureE_cubeContext --> measureF_cubeContext_v3
subgraph table queries
tableQuery_tableContext_v2[kpiD on X by L]
tableQuery_tableContext_v3["kpiC, kpiD(on Y) on X by L&M"]
end
measureC_cubeContext_v2 --> tableQuery_tableContext_v3
measureD_cubeContext_v3 --> tableQuery_tableContext_v2
measureF_cubeContext_v3 --> tableQuery_tableContext_v3
SQL integration is provided with the help of JooQ. To query a complex star/snowflake schema (i.e. with many/deep joins), one should provide a TableLike expressing these JOINs.
For instance:
Table<Record> fromClause = DSL.table(DSL.name(factTable))
.as("f")
.join(DSL.table(DSL.name(productTable)).as("p"))
.using(DSL.field("productId"))
.join(DSL.table(DSL.name(countryTable)).as("c"))
.using(DSL.field("countryId"));Such snowflake schema can be build more easily with the help of JooqSnowflakeSchemaBuilder.
See eu.solven.adhoc.column.IMissingColumnManager.onMissingColumn(String)
Right-management is typically implemented by an AND operation combining the user filter and a filter based on user-rights.
This can be achieved through IImplicitFilter. A Spring-Security example is demonstrated in TestImplicitFilter_SpringSecurity.
- How given performance are achieved?
Adhoc design delegates most of slow-to-compute sections to the underlying table. And 2020 brought a bunch of very fast database (e.g. DuckDB, RedShift, BigQuery).
- Can
Adhochandles indicators based on complex structures likearrayorstruct?
Most databases handles aggregations over primitive types (e.g. SUM over doubles). Adhoc can aggregate any type, given you can implement your own IAggregation.
This section does not refer to data storage in Adhoc (as, by principle, Adhoc does not store data). But it is about mechanisms used in the library to manage data, especially primitive types.
The general motivation is:
- Prevent boxing/unboxing as much as possible: one should be able to rely on primitive type, especially in critical section of the engine. This enable better performance, and lower GC pressure.
- Focus on
long,doubleandObject.intis managed aslong.floatis managed asdouble. - Easy way to rely on plain Objects, until later optimization phases enabling
longanddoublespecific management.
An IValueReceiver is subject to receiving data. Incoming data may be a long, a double or an Object (possibly null). The Object is not guaranteed not to be a long or a double.
An IValueProvider is subject to provide data. Outgoing data may be a long, a double or an Object (possibly null). The Object is not required not to be a long or a double.
The IOperatorsFactory is a way to inject custom logic in many places in the application. It is generally oriented towards measures customizations. Specifically, it can create:
IAggregation: typically used byAggregator, it expressed how to merge recursively any number of operands into an operand of similar type. (e.g.SUM)IComposition: typically used byCombinatator, it can be seen as an operator over a fixed number of operands. (e.g.SUBSTRACTION)IDecomposition: typically used byDispatchor, it can be seen as an operator splitting an input into aListofIDecompositionEntry(e.g.G8=FR+UK+etc)IFilterEditor: typically used byFiltrator, it can be seen as an operator mutating anIAdhocFilter
A IAdhocFilter is a way to restrict the data to be considered on a per-column basis. The set of filters is quite small:
AndFilter: anANDboolean operation over underlyingsIAdhocFilter. If there is no underlying, this is a.matchAll.OrFilter: anORboolean operation over underlyingsIAdhocFilter. If there is no underlying, this is a.matchNone.NotFilter: an!orNOTboolean operation over underlyingIAdhocFilter.IColumnFilter: an operator over a specific column with givenIValueMatcher.
A IValueMatcher applies to any Object. The variety of IValueMatcher is quite large, and easily extendible:
EqualsMatcher: true if the input is equal to some pre-definedObject.NullMatcher: true if the input is null.LikeMatcher: true if the input.toStringrepresentation matching the registeredLIKEexpression. www.w3schools.comRegexMatcher: true if the input.toStringrepresentation matching the registeredregexexpression.- etc
Adhoc provides a StandardOperatorFactory including generic operators (e.g. SUM).
- It can refer to custom operators by referring them by their
Class.getName()as key. - Your custom
IAggregation/ICombination/IDecomposition/IFilterEditorshould then have:- Either an empty constructor
- Or a
Map<String, ?>single-parameter constructor.
One may also define a custom IOperatorsFactory:
- by extending it
- by creating your own
IOperatorsFactoryand combining withCompositeOperatorFactory - by adding a fallback strategy with
DummyOperatorFactory
Humans are generally happier when things goes faster. Adhoc enables split-second queries over the underlying table. Very large queries can be performed with limited resources (e.g. a JVM with a few GB of RAM) and may take seconds/minutes.
The limiting factor in term of performance is generally the under table, which executes aggregations at the granularity requested by Adhoc, induced by the User GROUP BY, and those implied by some formulas (e.g. a Partitionor by Currency).
Hence, we do not target absolute performance in Adhoc. In other words, we prefer things to remains slightly slower, as long as it enables this project to remains simpler, given a query is generally slow due to the underlying ITableWrapper.
Adhoc performances can be improved by:
- Scale horizontally: each Adhoc instance is stateless, and can operate a User-query independently of other shards. There is no plan to enable a single Adhoc query to be dispatched through a cluster on Adhoc instance, but it may be considered if some project would benefit from such a feature.
- Enable caching (e.g.
CubeQueryStepcaching).
Concurrency is enabled by default (StandardQueryOptions.CONCURRENT), but it can be disabled through StandardQueryOptions.SEQUENTIAL.
Non-concurrent queries are executed in the calling-thread (e.g. MoreExecutors.newDirectExecutorService()).
Concurrent queries are executed in Adhoc own Executors.newWorkStealingPool. It can be customized through AdhocUnsafe.adhocCommonPool.
Concurrent sections are:
- subQueries in a
CompositeCubesTableWrapper: each subCube may be queried concurrently. CubeQueryStepsin a DAG: independent tasks may be executed concurrently.- tableQueries induced by leaves
CubeQUerySteps: independent tableQueries may be executed concurrently.
If you encounter a case which performance would be much improved by multi-threading, please report its specificities through a new issue. A benchmark / unitTest demonstrating the case would be very helpful.
parallelism can be configured in AdhocUnsafe.parallelism or through -Dadhoc.parallelism=16.
Adhoc does not load data as its result are always based on results from underlying databases/ITableWrapper. Still, it may
be necessary to enable data customization similarly to ETL operations.
There is a few options to achieve such behavior.
If you use JooqTableWrapper and the underlying database enables transient storage (e.g. like DuckDB), you could add a table and enrich your
JooQ table.
public void createCustomTable(DSLSupplier dslSupplier) {
DSLContext dslContext = dslSupplier.getDSLContext();
dslContext.createTable("customTable")
.column("customKey", SQLDataType.VARCHAR)
.column("customColumn", SQLDataType.VARCHAR)
.execute();
dslContext.connection(c -> {
DuckDBConnection duckDbC = (DuckDBConnection) c;
DuckDBAppender appender = duckDbC.createAppender("main", "customTable");
ImmutableMap.builder()
.put("keyA", "customA")
.put("keyB", "customB")
.put("keyC", "customC")
.build().forEach((key,value) -> {
try {
appender.beginRow();
appender.append(key);
appender.append(value);
appender.endRow();
} catch (SQLException e) {
throw new RuntimeException(e);
}
});
});
}and then add a join to your JooQ table:
public void joinTable(Table baseTable) {
baseTable.leftJoin(DSL.table("customTable")).on(DSL.field("rawValue").eq(DSL.field("customKey")));
}The column customColumn or "customTable"."customColumn" can now be referenced as any other column.
ICalculatedColumn enables
See OPTIMISATIONS.MD for details about optimizations involved in Adhoc.
See UNSAFE.MD for details about how to perform advanced tweaks in Adhoc.
m2-central Changes: CHANGES.MD Roadmap: ROADMAP.MD
Known limitations would generally trigger a NotYetImplementedException: please open a ticket to report your actual use-case for given scenario.
Thanks EJ Technologies for kindly providing an OpenSource license for JProfiler (their Java Profiler).