Cube Core v1.3 — Performance improvements and upgrades

Cube Core version 1.3 is the latest release to date. It includes various improvements to performance, data source support, and a few API endpoints.

This release contains breaking changes. Before upgrading, please familiarize yourself with the following changes and make sure to adjust your data model and configuration if needed:

Non-strict date range matches in pre-aggregations
Removal of top-level includes and excludes in views
New API scope for the /v1/sql endpoint
Updates to data source concurrency
Fixes to extends

Also, we recommend testing major and minor releases on a staging environment before bringing them to your production environment.

BREAKING: Non-strict date range matches in pre-aggregations

Previously, we've introduced the allow_non_strict_date_range_match parameter for pre-aggregations to enable them to match queries with date ranges that are not necessarily aligned with the start or the end of pre-aggregation partitions. This is particularly useful when connections Cube to BI tools such as Tableau or Apache Superset because they use such loose date ranges by default.

Now, this is the default behavior for all pre-aggregations, controlled via the CUBEJS_PRE_AGGREGATIONS_ALLOW_NON_STRICT_DATE_RANGE_MATCH environment variable. You can opt out for all pre-aggregations by using this environment variable or for individual pre-aggregations by using their allow_non_strict_date_range_match parameter.

We think that the convenience and performance gains of using pre-aggregations to accelerate queries generated by BI tools far outweights a tiny chance of data inaccuracy caused by non-strict date range matching.

BREAKING: Removal of top-level `includes` and `excludes` in views

Previously, it was possible to define a view using top-level includes and excludes parameters, like in the following example. These parameters were deprecated in v0.34.34 and now they are finally removed.

cubes:
  - name: my_cube
    sql: SELECT 123
    measures:
      - name: count
        type: count

views:
  - name: my_view
    # Don't do this!
    includes:
      - my_cube.count

Instead, cubes and join_path parameters should be always used so that you can explicitly control join paths.

BREAKING: New API scope for the `/v1/sql` endpoint

Previously, the /v1/sql endpoint of the REST API was part of the data API scope.

Now, this endpoint is moved to the new sql API scope, allowing for more granular access control. The sql scope is included into the list of default API scopes, configurable via the CUBEJS_DEFAULT_API_SCOPES environment variable. Unless you have explicitly configured that list, this change should not be breaking for you.

Read more about improvements to the /v1/sql API endpoint below.

IMPORTANT: Updates to data source concurrency

To ensure optimal querying performance, concurrency settings were adjusted for a few popular data sources such as Amazon Athena, Amazon Redshift, ClickHouse, Databricks, and Snowflake.

There's now a new documentation page that explains concurrency and lists concurrency settings for popular data sources. It's recommended to use the default configuration unless you have performed your own testing to ensure that your data source can handle more concurrent queries.

Finally, it's now possible to configure refresh worker concurrency via the specific CUBEJS_REFRESH_WORKER_CONCURRENCY environment variable.

IMPORTANT: Fixes to `extends`

Various bugs related to the cube inheritance and the extends parameter were fixed:

Joins, pre-aggregations, and access policies are now correctly inherited.
It is now possible to use the sql parameter in a parent cube and the sql_table parameter in a child cube and vice versa.

In case your data model relied on the faulty behavior when some of cube members were not correctly inherited, you can fall back to it by explicitly redefining inherited members:

cubes:
  - name: other
    sql: SELECT 123 AS id
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
    measures:
      - name: count_other
        type: count

  - name: parent
    sql: SELECT 456 AS id
    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
    joins:
      - name: other
        sql: 1 = 1
        relationship: one_to_one
    measures:
      - name: count
        type: count

  - name: child
    extends: parent
    # Now works!
    sql_table: child_table
    # Join to `other` is inherited from `parent`

Performance optimizations

Query orchestration speed-up with native code

We've implemented a number of performance optimizations, such as deserializing Cube Store result sets in native Rust code, that should have positive impact on query performance. We're rolling these changes out in this release by settings the CUBEJS_TESSERACT_ORCHESTRATOR environment variable to true by default.

Compilation speed-up with worker threads

We've improved the data model compilation by moving critical parts of the code to worker threads. You can opt in and enable this optimization by setting the CUBEJS_TRANSPILATION_WORKER_THREADS environment variable to true.

If you encounter any issues or see any changes/improvements to your deployments, please report on GitHub.

New in data source support

Various fixes were applied to Amazon Redshift, ClickHouse, Databricks, DuckDB, Dremio, Snowflake drivers. Additionally, these improvements are included:

Added an option to turn on client compression for ClickHouse.
Added databasePath and motherDuckToken configuration options for DuckDB.
Added options to configure StarTree authentication and null handling for Pinot.
Migrated to the newest backwards-compatible vendor-developed driver for Databricks.

Key-pair authentication for Snowflake

The support for encrypted private keys was added for Snowflake. You can now use the CUBEJS_DB_SNOWFLAKE_PRIVATE_KEY_PASS environment variable to provide the encryption key.

This is particularly important since Snowflake deprecates single-factor password sign-ins for service users and recommends using key-pair authentication.

GCS export bucket for Databricks

You can now use export buckets on Google Cloud Srorage with Databricks. With this update, Databricks support for export buckets now includes all three major cloud platforms.

New in APIs

Support for rebuilding specific pre-aggregation partitions

The /v1/pre-aggregations/jobs API endoint allows to trigger pre-aggregation build jobs or retrieve their statuses. It supports targeting pre-aggregations by security contexts, time zones, data sources, cubes, and names.

Now it also supports targeting specific partitions of those pre-aggregations by specifyng a date range that those partitions intersect with by their build range. See the following example utilizing the new dateRange parameter:

curl \
  -d '{
    "action": "post",
    "selector": {
      "contexts": [{ "securityContext": { "tenantId": "tenant1" } }],
      "timezones": ["America/Los_Angeles"],
      "preAggregations": ["orders.main"],
      "dateRange": ["2020-01-01", "2020-02-01"]
    }
  }' \
  -H "Authorization: EXAMPLE-API-TOKEN" \
  -H "Content-Type: application/json" \
  -X POST \
  https://localhost:4000/cubejs-api/v1/pre-aggregations/jobs

Support for SQL API queries in the `/v1/sql` endpoint

The /v1/sql API endpoint takes an API query and returns the SQL query, generated by Cube, that can be executed against the data source. This is useful for debugging, understanding how Cube translates API queries into SQL queries, and providing transparency to SQL-savvy end users. Previously, only REST API queries were supported.

Now, you can also pass a SQL API query and get the generated query for it, if possible. See the following example utilizing the new format parameter:

curl \
  -H "Authorization: TOKEN" \
  -G \
  --data-urlencode 'query=SELECT COUNT(*) FROM orders' \
  --data-urlencode 'format=sql'  \  
  http://localhost:4000/cubejs-api/v1/sql

Response:

{
  "sql": {
    "status": "ok",
    "sql": [
      "SELECT\n      count(\"base_orders\".id) \"count_uint8_1__\"\n    FROM\n      (SELECT * FROM 's3://cube-tutorial/orders.csv') AS \"base_orders\"  LIMIT 50000",
      []
    ],
    "query_type": "regular"
  }
}

New in pre-aggregations

The support for specifying a subpath (folder) inside the storage bucket was introduced for the case when MinIO is used with Cube Store. It can be set via the new CUBESTORE_MINIO_SUB_PATH environment variable. Community members who have tested it report better stability of MinIO in this setup.

New in the documentation

A few highlights to the updates in the documentation:

New recipe: Implementing custom calendars, e.g., 4-5-4 retail calendar.
New recipe: Calculating filtered aggregates.
API references have been moved to the APIs & integrations section. Example: SQL API reference.

Version upgrades

This release also upgrades Node.js, used internally by Cube, to v22. Node.js v20 is now deprecated, and the support for Node.js v18 has been removed.

What's next?

After checking these release notes, please upgrade and give this release a try. You can also do that for free in Cube Cloud.

We can't wait to hear your feedback and thoughts in the Slack community, both about the updates above and the features from the Cube Core public roadmap.

Enjoy the new features and happy Cube-ing!

Cube Core v1.3 — Performance improvements and upgrades

BREAKING: Non-strict date range matches in pre-aggregations

BREAKING: Removal of top-level `includes` and `excludes` in views

BREAKING: New API scope for the `/v1/sql` endpoint

IMPORTANT: Updates to data source concurrency

IMPORTANT: Fixes to `extends`

Performance optimizations

Query orchestration speed-up with native code

Compilation speed-up with worker threads

New in data source support

Key-pair authentication for Snowflake

GCS export bucket for Databricks

New in APIs

Support for rebuilding specific pre-aggregation partitions

Support for SQL API queries in the `/v1/sql` endpoint

New in pre-aggregations

New in the documentation

Version upgrades

What's next?

Upgrade your data stack today

More on Changelog

Cube Core v1.2 — Data access policies, hierarchies & folders, updates to Playground

Cube Core v1.2 — data access policies, hierarchies & folders, performance, and nicer Playground

Cube Core v1.0 — Query pushdown in the SQL API by default

BREAKING: Non-strict date range matches in pre-aggregations

BREAKING: Removal of top-level includes and excludes in views

BREAKING: New API scope for the /v1/sql endpoint

IMPORTANT: Updates to data source concurrency

IMPORTANT: Fixes to extends

Performance optimizations

Query orchestration speed-up with native code

Compilation speed-up with worker threads

New in data source support

Key-pair authentication for Snowflake

GCS export bucket for Databricks

New in APIs

Support for rebuilding specific pre-aggregation partitions

Support for SQL API queries in the /v1/sql endpoint

New in pre-aggregations

New in the documentation

Version upgrades

What's next?

Upgrade your data stack today

More on Changelog

Cube Core v1.2 — Data access policies, hierarchies & folders, updates to Playground

Cube Core v1.2 — data access policies, hierarchies & folders, performance, and nicer Playground

Cube Core v1.0 — Query pushdown in the SQL API by default

BREAKING: Removal of top-level `includes` and `excludes` in views

BREAKING: New API scope for the `/v1/sql` endpoint

IMPORTANT: Fixes to `extends`

Support for SQL API queries in the `/v1/sql` endpoint