About Us

Citrus Consulting Services is the Consulting and the Transformation Services arm of Redington Gulf.

Sunday – Thursday: 9:00AM–6:00PM (Sales), Sunday – Saturday: 24×7 / 365 (Support) E.O#3, Ground Floor, Building 01 Dubai Internet City, P.O Box 501 761 Dubai, UAE (+971) 04 516 1500
(+966) 11 462 5323
Image Alt

Deployment of the High Availability Active Passive Architecture Using Pentaho Data Integration

Customer Introduction

Flydubai, legally Dubai Aviation Corporation, is a government-owned budget airline in Dubai, United Arab Emirates with its head office and flight operations in Terminal 2 of Dubai International Airport. The airline operates a total of 95 destinations, serving the Middle East, Africa, Asia and Europe from Dubai.

Citrus Consulting Services Enables Airline Company with Deployment of the High Availability Active Passive Architecture Using Pentaho Data Integration

Challenge Overview

Customer encountered the following challenges which inclined them to evaluate and leverage capabilities of High Availability (HA) Architecture

  • Flydubai needed a decentralized system to make their jobs and transformations highly available, secure, reliable, scalable and cost effective.
  • Flydubai was interested in an approach that over comes the fail overs of the server that held the jobs and transformations.
  • Flydubai also had a critical requirement of hosting in multiple availability zones in order to ensure business continuity and high availability.
  • Flydubai also had a fundamental requirement of a robust and reliable disaster recovery setup in case of any unprecedented circumstances due to encounters previously faced.

Solution Overview

Architecture of Pentaho Data Integration

With the spontaneous growth of data, it has been complex task for business organizations like Flydubai to get a real value from big data.

Pentaho provides, the innovative Streamlined Data Refinery (SDR) which is a flexible, economical way to process and automate delivery of information to many users for many analytical purposes. It sets a new standard of data delivery by streamlining the process, empowering business users like Flydubai. The design pattern accommodates an on-demand process from user-initiated data requests, blending and refining the data, automatic analysis schema generation, and the ability to publish analytical data sets in any format.

Pentaho’s highly scalable data integration engine, managed through its intuitive end user interface, provides the glue between the various data sources and stores in this architecture. This process can be actioned on-demand using PDI:

  • Blending & Orchestration: PDI absorbs data from any data source and then processes, cleanses and blends the data to drive insight.
  • Automatic Modelling & Publishing: PDI, as part of the data orchestration process, creates an schema and publishes it to Analytics or database server for end user visualization.
  • Governance: It can promptly validate data sources blended at the source, allowing for the right measure of control. Governed Data Delivery is the delivery of blended, trusted and timely data to power analytics, regardless of positions.

Pentaho High Availability Architecture

Most installations of Pentaho are single-server installations. This solution works well in small- and medium-sized organizations where users and developers are limited to a handful of people. However, in large scale deployments, a clustered High Availability (HA) solution is needed to address the increase in data processing and concurrent user connections.

Client requests and application loads are distributed across many servers in the same datacenter, or many datacenters, in either an active-active or active-passive mode. Requests and application load automatically fail over, when using routing policies, in case of failure of primary servers.

We have deployed active-passive mode of load balancing for Flydubai where, two servers are made to point to the common data base repository. One server is made up and is called active server and the other is down and called Passive server. When one server is in active mode the other will be in the passive mode. When the active server fails over the passive server turns out to be active without disturbing the scheduled jobs and transformations. High Availability process have been carried out with the help of http proxy Load balancing server.

Project Timelines

Project starts with requirement gathering and platform setup. PDI Enterprise edition is deployed with High availability mode of Active passive approach. A Tomcat Proxy server acts as load balancer and a SQL SERVER as database repository. PDI machines is Whitelisted in source systems. The associated Jar files and plugins required for smooth running of PDI are added. ETL jobs and Transformations are migrated from existing setup into PDI HA machines. Scheduler is set up to run the jobs at specified intervals based on existing setup. Version control is configured to maintain the change management within the PDI repository. Post setting up of ETL Pipelines, the flow is verified, and the data is validated from customer’s end. The project can now be pushed for production release. Post training and KT, signoff marks the project completion.

Solution Outcome

  • Considerable drop in the overall cost of operations for Flydubai.
  • Ensuring 100% business continuity of the applications and operations of Flydubai.
  • Successful implementation and testing of failover and failback in the Disaster Recovery scenario.
  • Meticulous analysis and performed testing for application latency.

Key Datapoints

  • HA deployment can spread Pentaho application footprints across many servers that are either co-located or in different data centers. This serves growing application loads and reduces the CPU, memory, and I/O strains on a single server
  • HA deployment can solve application latency by routing client requests or traffic to servers that are geographically close to application clients, effectively solving old long network round trips.
  • HA deployment can provide a convenient way to achieve geolocation in circumstances where organizations require that client requests be restricted to data stored in certain geographic radii, based on point of origin.
  • HA deployment prevents the need for single server deployment of Pentaho or any other application to occasionally go offline for planned or emergency maintenance.
  • HA deployment can be used to route traffic from a primary to a backup set of servers to prepare business continuity plans for Flydubai.

Lessons Learned

HA deployment helps to understand how to achieve scalability, Application latency reduction, Application maintenance and disaster recovery in Pentaho environment. Also helps to Achieve data security.

Project Info