Pentaho Metadata Injection
Pentaho is popularly known as an open source Business Intelligence package. Pentaho has ETL, analysis, metadata and reporting capabilities. This BI tool helps customers to identify the benefits of big data and helps in obtaining a cost-effective, agile and productive cloud delivery model.
The flexibility of Pentaho has helped it to be adopted by major enterprise and business groups today. Pentaho breaks the barrier that blocks the company’s ability to extract the valuable data. Pentaho further helps converting the data into value.
Flexibility of Pentaho hides behind:
- Data delivery: It merges timely data for powerful data analytics at scale for all users in all environments
- Platform Fusion: Pentaho supports multi-tenant architecture, which allows the fusion of analytics into different workflow application like Cloud, mobile and hybrid data models.
- Amalgamation: PDI helps in amalgamation of data from multiple sources and Provides flexibility of analytics and helps in turning big data into valuable insights.
- User Friendly visual tools: The visual drag and drop tools keep users away from the complexity of coding.
Metadata Injection is a feature of Pentaho which helps to integrate Data from different platforms at runtime and helps in reducing repetitive ETL tasks. The following diagram depicts the process of metadata injection.
Considering an example, Let us assume that, we need to select the values from three different tables (customer, products, order) from a MSSQL database and copy the values in another three tables (meta_customer, meta_products, meta_order) of same MSSQL (or any other Database servers) Database. Without Metadata injection step this could be tedious job to create connections every time and create multiple steps for the process.
With Metadata injection step, we can create a template which will be capable of accepting different queries and generate the desired outputs. Metadata injection step hence helps us to inject the queries to the template. Let’s now consider how it can be done.
Step 1: Create a Template Transformation
- Create an empty transformation.
- Drag a table input step on to canvas
- Provide all the connection details of MSSQL DB in table input step and test for connection and click ok
- Do not provide any SQL Query since it is a template and click ok.
- Drag a Table output step on to canvas
- Provide all the connection details of MSSQL DB in Table output step and test for connection and click ok
- Select the schema but do not provide any target table since it is a template and click ok.
- Save the transformation as temp_transformation.ktr
Step 2: Create a Set Query Transformation.
- Create an empty transformation
- Drag a data grid on to the canvas
- Under Meta tab, under Names, add cust_query and target_table_name as type strings
- Under Data, under cust_query add the queries select * from customers, select * from products and select * from orders.
- Under Data, under target_table_name add the name of the output target tables, meta_customer, meta_product, meta_orders and click ok.
- Add copy rows to result step on to the canvas and connect it from data grid
- Save the transformation as set_query_for_template.ktr
Step 3: Create a Meta Main Transformation.
(Since we have created a template and a query to be passed to template, we now need a carrier that carries our queries to the template. This can be achieved by metadata injection step.)
- Open an empty transformation
- Add get rows from result step on to the canvas
- Under get rows from result, under Field name, add two fields cust_query and target_table_name as string and click ok.
- Add ETL Metadata Injection step on to the canvas.
- Under ETL Metadata Injection, under transformation, browse for temp_transformation.ktr file.
- Under inject metadata, under Table input, SQL, under Source step, select Get rows from result and select Source field as cust_query and click ok.
- Under inject metadata, under Table output, Target table, under Source step, select Get rows from result and select Source field as target_table_name click ok.
- Save the file as meta_main.ktr
Step 4: Create a Job to Schedule the Entire Process.
- Create a new empty job
- Add start step to the canvas and set the interval in which the job needs to be run.
- Add transformation step to the canvas and call it set queries and connect it from start step and under transformation browse for set_query_for_template.ktr file and click ok.
- Add one more transformation step to the canvas and call it injection and connect it from set queries step and under transformation browse for meta_main.ktr file and click ok.
- Save the job as Metadata_injection_job.kjb
Now we can run the Metadata_injection_job and observe that the new tables, meta_customer, meta_product and meta_orders are created as required in the MSSQL Database.