Introduction
In modern data-driven applications, multi-tenancy is essential for efficiently managing multiple customers, business units, or organizational divisions within a shared infrastructure. Apache Hop and Putki provide various multi-tenancy approaches at both the database and file system levels, ensuring flexibility in deployment and resource management.
This document explores different multi-tenancy strategies, their advantages and challenges, and a recommended blueprint architecture for implementing multi-tenancy in Apache Hop and Putki. While our focus is on relational databases, similar principles apply to NoSQL databases like Neo4j or MongoDB.
Multi-Tenancy Approaches
The most common type of multi-tenancy is data multi-tenancy, which allows developers to enforce custom data access rules at runtime. A tenant may only access data associated with their tenant ID. Below are four key approaches to multi-tenancy in Apache Hop and Putki:
1. Sharding (Database-Level Separation)
Each tenant's data is stored in a separate database or schema, ensuring strict physical separation between tenants.
2. Striping (Shared Database, Tenant ID Column)
All tenants share a single database, but each table includes a tenant ID column to separate data logically.
3. Data Models (Row-Level Access Control)
Tenancy is enforced at the data level, allowing different tenants (or sub-tenants) to access only specific data.
4. Hybrid (Combining Sharding, Striping, and Data Models)
This approach mixes elements of the previous three, depending on security, scalability, and performance needs.
A hybrid model is often used because no single approach fits all tenants needs.
Example of a Hybrid Multi-Tenancy Approach
A Software-as-a-Service (SaaS) company provides a data analytics platform for multiple customers (tenants). Since their clients have different security, scalability, and performance requirements, they use a hybrid approach that combines sharding, striping, and data models.
Hybrid Approach Configuration
- Sharding for Large Enterprises
- Large enterprise customers with strict security requirements and high data volumes are assigned a separate database (shard).
- Example: "Company X" has a dedicated database due to data protection regulations.
- Striping for Medium-Sized Businesses
- Mid-sized clients share the same database, but data is tagged with a Tenant ID to enforce access restrictions.
- Example: "Company Y" and "Company Z" use the same database but can only access their own records.
- Data Models with Row-Level Security for Internal Users
- Users within the same tenant have role-based access to specific rows of data.
- Example: Within "Company Y," managers can view all sales reports, while individual sales reps can only see their own sales data.
Remarks:
- Scalability: Large customers don’t impact the performance of other tenants.
- Security & Isolation: Enterprises with strict compliance needs can have dedicated databases.
- Cost Optimization: Smaller clients share infrastructure efficiently.
Data Model Management
Regardless of the multi-tenancy approach, a strong data modeling strategy is critical. A DDL management framework like Flyway, combined with the Putki Flyway action plugin, is essential for database structure management.
Best Practices for Data Model Management:
- Use schema versioning tools (e.g., Flyway)
- Automate database migrations for tenant onboarding
- Enforce strict data governance policies
Limitations & Challenges
1. Permissions Management
- Apache Hop lacks built-in authentication and authorization.
- All permissions must be enforced at the source or target platform level (database, file system, etc.).
2. Hop Server Constraints
- The current Apache Hop Server is not designed for multi-tenancy due to its single-user architecture.
- Workarounds exist, such as routing via Nginx or deploying a separate Hop Server per tenant, but they increase complexity.
Blueprint Architecture for Multi-Tenancy
A recommended multi-tenant architecture includes the following components:
1. Shared Apache Hop Project
- A common project codebase that applies to all tenants.
2. Global Environment Files (Optional)
- If all tenants use shared data sources, a global configuration file can be used.
- This is optional and depends on the use case.
3. Tenant-Specific Environments
Each tenant has a dedicated environment configuration, customized based on the multi-tenancy approach:
Sharding:
- Each tenant has a separate database, with unique connection details in the environment file.
Striping:
- All tenants share the same database, with a tenant ID column for logical separation.
- Administrators must enforce strict access control.
Conclusion
Apache Hop and Putki offer versatile multi-tenancy options, but choosing the right approach depends on security, scalability, and operational efficiency.
Key Takeaways:
- Sharding offers strong isolation but is costly.
- Striping is cost-effective but requires strict access control.
- Data models allow granular control but need predefined schemas.
- Hybrid solutions balance security, scalability, and cost.
- Hop Server is not ideal for multi-tenancy, requiring alternative solutions.
Don't miss the video below for a step-by-step walkthrough!
Need help implementing multi-tenancy?
Our team is here to assist you with setup, best practices, and troubleshooting. Get expert guidance to optimize your deployment and ensure scalability.
Multi-Tenancy in Apache Hop