Skip to Content

Multi-Tenancy in Apache Hop

Introduction

In modern data-driven applications, multi-tenancy is essential for efficiently managing multiple customers, business units, or organizational divisions within a shared infrastructure. Apache Hop and Putki provide various multi-tenancy approaches at both the database and file system levels, ensuring flexibility in deployment and resource management.

This document explores different multi-tenancy strategies, their advantages and challenges, and a recommended blueprint architecture for implementing multi-tenancy in Apache Hop and Putki. While our focus is on relational databases, similar principles apply to NoSQL databases like Neo4j or MongoDB.

Multi-Tenancy Approaches

The most common type of multi-tenancy is data multi-tenancy, which allows developers to enforce custom data access rules at runtime. A tenant may only access data associated with their tenant ID. Below are four key approaches to multi-tenancy in Apache Hop and Putki:

1. Sharding (Database-Level Separation)

Each tenant's data is stored in a separate database or schema, ensuring strict physical separation between tenants.

2. Striping (Shared Database, Tenant ID Column)

All tenants share a single database, but each table includes a tenant ID column to separate data logically.

3. Data Models (Row-Level Access Control)

Tenancy is enforced at the data level, allowing different tenants (or sub-tenants) to access only specific data.

4. Hybrid (Combining Sharding, Striping, and Data Models)

This approach mixes elements of the previous three, depending on security, scalability, and performance needs.

A hybrid model is often used because no single approach fits all tenants needs.

Example of a Hybrid Multi-Tenancy Approach

A Software-as-a-Service (SaaS) company provides a data analytics platform for multiple customers (tenants). Since their clients have different security, scalability, and performance requirements, they use a hybrid approach that combines sharding, striping, and data models.

Hybrid Approach Configuration
  1. Sharding for Large Enterprises
    • Large enterprise customers with strict security requirements and high data volumes are assigned a separate database (shard).
    • Example: "Company X" has a dedicated database due to data protection regulations.
  2. Striping for Medium-Sized Businesses
    • Mid-sized clients share the same database, but data is tagged with a Tenant ID to enforce access restrictions.
    • Example: "Company Y" and "Company Z" use the same database but can only access their own records.
  3. Data Models with Row-Level Security for Internal Users
    • Users within the same tenant have role-based access to specific rows of data.
    • Example: Within "Company Y," managers can view all sales reports, while individual sales reps can only see their own sales data.

Remarks:

  • Scalability: Large customers don’t impact the performance of other tenants.
  • Security & Isolation: Enterprises with strict compliance needs can have dedicated databases.
  • Cost Optimization: Smaller clients share infrastructure efficiently.
Data Model Management

Regardless of the multi-tenancy approach, a strong data modeling strategy is critical. A DDL management framework like Flyway, combined with the Putki Flyway action plugin, is essential for database structure management.

Best Practices for Data Model Management:

  • Use schema versioning tools (e.g., Flyway)
  • Automate database migrations for tenant onboarding
  • Enforce strict data governance policies

Limitations & Challenges

1. Permissions Management
  • Apache Hop lacks built-in authentication and authorization.
  • All permissions must be enforced at the source or target platform level (database, file system, etc.).
2. Hop Server Constraints
  • The current Apache Hop Server is not designed for multi-tenancy due to its single-user architecture.
  • Workarounds exist, such as routing via Nginx or deploying a separate Hop Server per tenant, but they increase complexity.

Blueprint Architecture for Multi-Tenancy

A recommended multi-tenant architecture includes the following components:

1. Shared Apache Hop Project
  • A common project codebase that applies to all tenants.
2. Global Environment Files (Optional)
  • If all tenants use shared data sources, a global configuration file can be used.
  • This is optional and depends on the use case.
3. Tenant-Specific Environments

Each tenant has a dedicated environment configuration, customized based on the multi-tenancy approach:

Sharding:

  • Each tenant has a separate database, with unique connection details in the environment file.

Striping:

  • All tenants share the same database, with a tenant ID column for logical separation.
  • Administrators must enforce strict access control.

Conclusion

Apache Hop and Putki offer versatile multi-tenancy options, but choosing the right approach depends on security, scalability, and operational efficiency.

Key Takeaways:

  • Sharding offers strong isolation but is costly.
  • Striping is cost-effective but requires strict access control.
  • Data models allow granular control but need predefined schemas.
  • Hybrid solutions balance security, scalability, and cost.
  • Hop Server is not ideal for multi-tenancy, requiring alternative solutions.

Don't miss the video below for a step-by-step walkthrough!


Need help implementing multi-tenancy?

Our team is here to assist you with setup, best practices, and troubleshooting. Get expert guidance to optimize your deployment and ensure scalability.


Multi-Tenancy in Apache Hop
Bart Maertens March 18, 2025
Share this post
Archive