Skip to Content

Multi-Tenancy in Apache Hop

Introduction

In modern data-driven applications, multi-tenancy is essential for efficiently managing multiple customers, business units, or organizational divisions within a shared infrastructure. Apache Hop and Putki provide various multi-tenancy approaches at both the database and file system levels, ensuring flexibility in deployment and resource management.

single vs multi tenant overview

This document explores different multi-tenancy strategies, their advantages and challenges, and a recommended blueprint architecture for implementing multi-tenancy in Apache Hop and Putki. While our focus is on relational databases, similar principles apply to NoSQL databases like Neo4j or MongoDB.

Multi-Tenancy Approaches

The most common type of multi-tenancy is data multi-tenancy, which allows developers to enforce custom data access rules at runtime. A tenant may only access data associated with their tenant ID. Below are four key approaches to multi-tenancy in Apache Hop and Putki:

1. Sharding (Database-Level Separation)

Each tenant's data is stored in a separate database or schema, ensuring strict physical separation between tenants.

database vs schema level sharding

2. Striping (Shared Database, Tenant ID Column)

All tenants share a single database, but each table includes a tenant ID column to separate data logically.

striping with a tenant id column

3. Data Models (Row-Level Access Control)

Tenancy is enforced at the data level, allowing different tenants (or sub-tenants) to access only specific data.

row-level access control data models

4. Hybrid (Combining Sharding, Striping, and Data Models)

This approach mixes elements of the previous three, depending on security, scalability, and performance needs.

A hybrid model is often used because no single approach fits all tenants needs.

hybrid multi-tenancy

Example of a Hybrid Multi-Tenancy Approach

A Software-as-a-Service (SaaS) company provides a data analytics platform for multiple customers (tenants). Since their clients have different security, scalability, and performance requirements, they use a hybrid approach that combines sharding, striping, and data models.

Hybrid Approach Configuration
  1. Sharding for Large Enterprises
    • Large enterprise customers with strict security requirements and high data volumes are assigned a separate database (shard).
    • Example: "Company X" has a dedicated database due to data protection regulations.
  2. Striping for Medium-Sized Businesses
    • Mid-sized clients share the same database, but data is tagged with a Tenant ID to enforce access restrictions.
    • Example: "Company Y" and "Company Z" use the same database but can only access their own records.
  3. Data Models with Row-Level Security for Internal Users
    • Users within the same tenant have role-based access to specific rows of data.
    • Example: Within "Company Y," managers can view all sales reports, while individual sales reps can only see their own sales data.

Remarks:

  • Scalability: Large customers don’t impact the performance of other tenants.
  • Security & Isolation: Enterprises with strict compliance needs can have dedicated databases.
  • Cost Optimization: Smaller clients share infrastructure efficiently.
Data Model Management

Regardless of the multi-tenancy approach, a strong data modeling strategy is critical. A DDL management framework like Flyway, combined with the Putki Flyway action plugin, is essential for database structure management.

data model management with schema versioning, automated database migrations and strict data eccess enforcement

Best Practices for Data Model Management:

  • Use schema versioning tools (e.g., Flyway)
  • Automate database migrations for tenant onboarding
  • Enforce strict data governance policies

Limitations & Challenges

multi-tenancy limitations in Apache Hop: no authentication or built-in multi-tenancy

1. Permissions Management
  • Apache Hop lacks built-in authentication and authorization.
  • All permissions must be enforced at the source or target platform level (database, file system, etc.).
2. Hop Server Constraints
  • The current Apache Hop Server is not designed for multi-tenancy due to its single-user architecture.
  • Workarounds exist, such as routing via Nginx or deploying a separate Hop Server per tenant, but they increase complexity.

Blueprint Architecture for Multi-Tenancy

A recommended multi-tenant architecture includes the following components:multi-tenancy blueprint architecture in Apache Hop

1. Shared Apache Hop Project
  • A common project codebase that applies to all tenants.
2. Global Environment Files (Optional)
  • If all tenants use shared data sources, a global configuration file can be used.
  • This is optional and depends on the use case.
3. Tenant-Specific Environments

Each tenant has a dedicated environment configuration, customized based on the multi-tenancy approach:

Sharding:

  • Each tenant has a separate database, with unique connection details in the environment file.

Striping:

  • All tenants share the same database, with a tenant ID column for logical separation.
  • Administrators must enforce strict access control.

Conclusion

Apache Hop and Putki offer versatile multi-tenancy options, but choosing the right approach depends on security, scalability, and operational efficiency.

Key Takeaways:

  • Sharding offers strong isolation but is costly.
  • Striping is cost-effective but requires strict access control.
  • Data models allow granular control but need predefined schemas.
  • Hybrid solutions balance security, scalability, and cost.
  • Hop Server is not ideal for multi-tenancy, requiring alternative solutions.

Don't miss the video below for a step-by-step walkthrough!


Need help implementing multi-tenancy?

Our team is here to assist you with setup, best practices, and troubleshooting. Get expert guidance to optimize your deployment and ensure scalability.


Multi-Tenancy in Apache Hop
know.bi, Bart Maertens March 18, 2025
Share this post
Archive