About Us

Citrus Consulting Services is the Consulting and the Transformation Services arm of Redington Gulf.

Sunday – Thursday: 9:00AM–6:00PM (Sales), Sunday – Saturday: 24×7 / 365 (Support) E.O#3, Ground Floor, Building 01 Dubai Internet City, P.O Box 501 761 Dubai, UAE (+971) 04 516 1500
(+966) 11 462 5323
info@citrusconsulting.com
Citrus Consulting Services

Pentaho Metadata Injection

Pentaho is popularly known as an open source Business Intelligence package. Pentaho has ETL, analysis, metadata and reporting capabilities. This BI tool helps customers to identify the benefits of big data and helps in obtaining a cost-effective, agile and productive cloud delivery model.

The flexibility of Pentaho has helped it to be adopted by major enterprise and business groups today. Pentaho breaks the barrier that blocks the company’s ability to extract the valuable data. Pentaho further helps converting the data into value.

Flexibility of Pentaho hides behind:

  • Data delivery: It merges timely data for powerful data analytics at scale for all users in all environments
  • Platform Fusion: Pentaho supports multi-tenant architecture, which allows the fusion of analytics into different workflow application like Cloud, mobile and hybrid data models.
  • Amalgamation PDI helps in amalgamation of data from multiple sources and Provides flexibility of analytics and helps in turning big data into valuable insights.
  • User Friendly visual toolsThe visual drag and drop tools keep users away from the complexity of coding.

Metadata Injection is a feature of Pentaho which helps to integrate Data from different platforms at runtime and helps in reducing repetitive ETL tasks. The following diagram depicts the process of metadata injection.

Considering an example, Let us assume that, we need to select the values from three different tables (customer, products, order) from a MSSQL database and copy the values in another three tables (meta_customer, meta_products, meta_order) of same MSSQL (or any other Database servers) Database. Without Metadata injection step this could be tedious job to create connections every time and create multiple steps for the process.

With Metadata injection step, we can create a template which will be capable of accepting different queries and generate the desired outputs. Metadata injection step hence helps us to inject the queries to the template. Let’s now consider how it can be done.

Step 1: Create a Template Transformation

  • Create an empty transformation.
  • Drag a table input step on to canvas
  • Provide all the connection details of MSSQL DB in table input step and test for connection and click ok
  • Do not provide any SQL Query since it is a template and click ok.
  • Drag a Table output step on to canvas
  • Provide all the connection details of MSSQL DB in Table output step and test for connection and click ok
  • Select the schema but do not provide any target table since it is a template and click ok.
  • Save the transformation as temp_transformation.ktr

Step 2: Create a Set Query Transformation.

  • Create an empty transformation
  • Drag a data grid on to the canvas
  • Under Meta tab, under Names, add cust_query and target_table_name as type strings
  • Under Data, under cust_query add the queries select * from customers, select * from products and select * from orders.
  • Under Data, under target_table_name add the name of the output target tables, meta_customer, meta_product, meta_orders and click ok.
  • Add copy rows to result step on to the canvas and connect it from data grid
  • Save the transformation as set_query_for_template.ktr

Step 3: Create a Meta Main Transformation.

(Since we have created a template and a query to be passed to template, we now need a carrier that carries our queries to the template. This can be achieved by metadata injection step.)

  • Open an empty transformation
  • Add get rows from result step on to the canvas
  • Under get rows from result, under Field name, add two fields cust_query and target_table_name as string and click ok.
  • Add ETL Metadata Injection step on to the canvas.
  • Under ETL Metadata Injection, under transformation, browse for temp_transformation.ktr file.
  • Under inject metadata, under Table input, SQL, under Source step, select Get rows from result and select Source field as cust_query and click ok.
  • Under inject metadata, under Table output, Target table, under Source step, select Get rows from result and select Source field as target_table_name click ok.
  • Save the file as meta_main.ktr

Step 4: Create a Job to Schedule the Entire Process.

  • Create a new empty job
  • Add start step to the canvas and set the interval in which the job needs to be run.
  • Add transformation step to the canvas and call it set queries and connect it from start step and under transformation browse for set_query_for_template.ktr file and click ok.
  • Add one more transformation step to the canvas and call it injection and connect it from set queries step and under transformation browse for meta_main.ktr file and click ok.
  • Save the job as Metadata_injection_job.kjb

Now we can run the Metadata_injection_job and observe that the new tables, meta_customer, meta_product and meta_orders are created as required in the MSSQL Database.

Amulya is an associate consultant in Citrus Consulting Serveries, Bangalore. She has secured master's degree in Computer networks and currently pursuing her Ph.D. She is a Data Analytics professional with hands on experience on designing dashboards using various Visualization tools, data scraping using Python and ETL tools for different customers of Middle East. She has experience in Data Warehouse deployment and involved in creating Machine learning models for the customers.

Post a Comment

15 + nine =