About Us

Citrus Consulting Services is the Consulting and the Transformation Services arm of Redington Gulf.

Sunday – Thursday: 9:00AM–6:00PM (Sales), Sunday – Saturday: 24×7 / 365 (Support) E.O#3, Ground Floor, Building 01 Dubai Internet City, P.O Box 501 761 Dubai, UAE (+971) 04 516 1500
(+966) 11 462 5323
Image Alt

Next Gen Data Warehouse Solution for Budget Airlines

Customer Introduction

Customer is a Saudi budget airline based in Jeddah. Airlines serves major popular domestic destinations in and around Saudi, it targets domestic travelers, Pilgrims and rising tourists in Saudi with its custom package, pricing, best in class flight experience and destination connectivity options.

Citrus Consulting Services Enables Airline Company with Next Gen Data Warehouse Solution. Future Ready Data Warehouse is Implemented with Reduced Cost and Improved Performance

Challenge Overview

  • Customer needs a scalable and cost-effective Next Gen Data Warehouse solution compared to existing RDBMS solutions.
  • Customer wants to optimize current data in the interpretation of analytical service usage and better schemas to create single point of truth.
  • Customer wants to explore cost effective solutions compared to current stack irrespective of Cloud or on-premise solutions.
  • Customer wants to replace Power BI datasets with New Optimized Datasets to enhance performance of the Dashboard.

Solution Overview

Amazon Redshift

Amazon Redshift is a fully managed, MPP, petabyte-scale, column oriented, cloud based data warehouse designed for large data set storage and analysis at scale. Redshift is designed to connect SQL based clients and BI tools, making data available to users at real time. Redshift stands apart from on premises and existing data warehouse solutions in the following areas:

  • Performance: Redshift delivers the fast query speeds on large data sets, dealing with data sizes up to a petabyte and more. Redshift processes data up to these sizes is just simply impossible to attain in traditional data warehousing, making it the top choice for applications that run massive amounts of queries on-demand.
  • Cost: Redshift has no substantial upfront costs to get setup, which is not in the case for other data warehouse solutions. Being fully managed, Redshift has no recurrent hardware and maintenance costs. on-demand pricing ensures you only pay for what you use.
  • Scalability: Redshift allows more flexible and elastic scale. As your requirements change, Redshift can scale up or down instantly to match your capacity and performance needs with a few clicks in the management console.
  • Security: Amazon Redshift has most security features of AWS. Credentials and access are granted and managed on the AWS-level through IAM. Inbound access control created through Cluster security group. Data encryption can be enabled upon cluster creation and cannot be switched from encrypted to unencrypted directly. For organizations, with private cloud, access through a Virtual Private Cloud (VPC) environment is available. For data in transit, Redshift uses SSL encryption to communicate with S3 or Amazon Dynamo DB for COPY, UNLOAD, backup, and restore operations.

Pentaho Data Integration

Pentaho Data Integration provides the Extract, Transform, and Load functionalities that facilitate the process of capturing, cleaning and storing data. The data can be stored in a uniform and consistent format that is easily accessible and relevant to end users. Common uses of PDI includes:

  • Data migration among different databases, file systems and applications.
  • Loading massive data sets into databases by taking full advantage of cloud, clustered and tremendously parallel processing environments
  • Data cleansing, which includes steps ranging from very simple to very complex transformations.
  • Data integration which includes the ability to control real-time ETL as a data source for reinforced reporting engines.
  • Data warehouse support for slowly changing dimensions of data

Solution Architecture

Solution for Customer is hosted in AWS environment with Redshift database and PDI deployed on EC2 instance. The source data was residing in SQL Database. Development server’s IP’s on which PDI is residing is whitelisted to grant access to the database. Data from SQL is pushed to Redshift using the pipelines created on user friendly GUI of PDI. The data refresh is scheduled in an incremental fashion once a day at same time (similar to existing setup). Then the relevant tables from Redshift are connected to Power BI, where team has written queries to optimize the datasets before loading in to Power BI. Current solution replaces Increased cost, slow performing system into High Performance and cost effective scalable system.

Project Timelines

Project kicks off with Requirement gathering, setting up data warehouse and PDI platform with required configurations. PDI and Data warehouse servers are Whitelisted to access Source systems and SQL server. Implementation phase starts with Data modeling and New schema creation.

Data modeling and New schema creation is based on the source systems and Integrating them with Redshift. Pipelines are set up using PDI which helps in loading the data from source systems to Redshift on daily and real time basis, as per the requirement. Post data refresh, a data gateway is set up to connect Redshift into Power BI and scheduling refresh cycle based on business needs. Power BI datasets are mapped with Redshift tables. The reports and dashboards are reconfigured based on new datasets. Post mapping, Unit testing and Validation has been performed. Post confirmation of UAT and validation from customer side, Solution is thus pushed for production. Training to customer team, project handover to customer and Signoff ends up the project.

Benefits Delivered to Customer

  • Future ready data warehouse is implemented with reduced cost and improved performance.
  • Current setup of scalable data warehouse, PDI as ETL engine for migration makes considerable drop in overall operations cost.
  • Fully managed Redshift services reduces the RTO and RPO in huge margin compared to Current solution and setup.
  • Data modeling with new schema has been able to achieve Single point of truth and much needed performance improvement while live querying and real time datasets push.
  • With New schema and tables, Power BI is able to accommodate more real time and historical datasets compared to current solution.

Project Info