Flydubai, legally Dubai Aviation Corporation, is a government-owned budget airline in Dubai, United Arab Emirates with its head office and flight operations in Terminal 2 of Dubai International Airport. The airline operates a total of 95 destinations, serving the Middle East, Africa, Asia and Europe from Dubai.
Customer encountered the following challenges which inclined them to evaluate and leverage capabilities of High Availability (HA) Architecture
- Flydubai needed a decentralized system to make their jobs and transformations highly available, secure, reliable, scalable and cost effective.
- Flydubai was interested in an approach that over comes the fail overs of the server that held the jobs and transformations.
- Flydubai also had a critical requirement of hosting in multiple availability zones in order to ensure business continuity and high availability.
- Flydubai also had a fundamental requirement of a robust and reliable disaster recovery setup in case of any unprecedented circumstances due to encounters previously faced.
With the spontaneous growth of data, it has been complex task for business organizations like Flydubai to get a real value from big data.
Pentaho provides, the innovative Streamlined Data Refinery (SDR) which is a flexible, economical way to process and automate delivery of information to many users for many analytical purposes. It sets a new standard of data delivery by streamlining the process, empowering business users like Flydubai. The design pattern accommodates an on-demand process from user-initiated data requests, blending and refining the data, automatic analysis schema generation, and the ability to publish analytical data sets in any format.
Pentaho’s highly scalable data integration engine, managed through its intuitive end user interface, provides the glue between the various data sources and stores in this architecture. This process can be actioned on-demand using PDI:
- Blending & Orchestration: PDI absorbs data from any data source and then processes, cleanses and blends the data to drive insight.
- Automatic Modelling & Publishing: PDI, as part of the data orchestration process, creates an schema and publishes it to Analytics or database server for end user visualization.
- Governance: It can promptly validate data sources blended at the source, allowing for the right measure of control. Governed Data Delivery is the delivery of blended, trusted and timely data to power analytics, regardless of positions.
Most installations of Pentaho are single-server installations. This solution works well in small- and medium-sized organizations where users and developers are limited to a handful of people. However, in large scale deployments, a clustered High Availability (HA) solution is needed to address the increase in data processing and concurrent user connections.
Client requests and application loads are distributed across many servers in the same datacenter, or many datacenters, in either an active-active or active-passive mode. Requests and application load automatically fail over, when using routing policies, in case of failure of primary servers.
We have deployed active-passive mode of load balancing for Flydubai where, two servers are made to point to the common data base repository. One server is made up and is called active server and the other is down and called Passive server. When one server is in active mode the other will be in the passive mode. When the active server fails over the passive server turns out to be active without disturbing the scheduled jobs and transformations. High Availability process have been carried out with the help of http proxy Load balancing server.
Project starts with requirement gathering and platform setup. PDI Enterprise edition is deployed with High availability mode of Active passive approach. A Tomcat Proxy server acts as load balancer and a SQL SERVER as database repository. PDI machines is Whitelisted in source systems. The associated Jar files and plugins required for smooth running of PDI are added. ETL jobs and Transformations are migrated from existing setup into PDI HA machines. Scheduler is set up to run the jobs at specified intervals based on existing setup. Version control is configured to maintain the change management within the PDI repository. Post setting up of ETL Pipelines, the flow is verified, and the data is validated from customer’s end. The project can now be pushed for production release. Post training and KT, signoff marks the project completion.
HA deployment helps to understand how to achieve scalability, Application latency reduction, Application maintenance and disaster recovery in Pentaho environment. Also helps to Achieve data security.