IBM InfoSphere Datastage 9.1/11.3 Course Content


Unit-1: Datawarehouse Fundamentals

An introduction to Data warehousing-Purpose of Data warehouse-Data warehouse Architecture-ETL Project Phases-ETL Architecture-Operational Data store-OLTP Vs Warehouse Applications-Data marts-Data marts Vs DWH-Data warehouse Life Cycle-Metadata management

Unit-2: ETL Design Process

Introduction to Extraction, Transformation & Loading-Types of ETL Tools-What to look for in ETL tools-Key tools in the market-ETL Trends& New Solution Options

Unit-3: Datastage Installation

Datastage Installation-Prerequisites to install Datastage-installation process

Unit-4: Introduction to IBM Datastage Quality Stage

History and Features-Differences between 7.5X2,8.X-IBM DS & QS 8.0.1-DS Info share 8.5 Enhancements-View on Web console, Profiling & Data Quality- Datastage Introduction-IBM information Server architecture-Datastage within the IBM Information server architecture-Datastage components-Datastage main functions-Client components-Traditional Batch Processing-Partition & pipeline Parallelism-Partitioning & Re-Partitioning Techniques-Combinality, Combining and collecting Techniques-Configuration File & Node Components.

Unit-5: Datastage Administrator

Datastage Project Administration-Editing projects and Adding projects-Deleting projects-Cleaning up project files-Global Variable setting-Environment management-Auto purging-Runtime Column Propagation (RCP)-Enable Remote Execution of Parallel jobs –Add check Points for sequencer-NLS configuration-Generated OSH(Orchestra Engine)-System Formats like date, timestamp-Project Protect-Version Details 

Unit-6: Datastage Director
Introduction to Datastage Director-Validating Datastage jobs-Executing Datastage jobs-job execution status-Monitoring a job-job log view-job scheduling-creating Batches-Scheduling batches-Message Handling ( Job & Project level)-Unlocking Job & Customize.

Unit-7: Datastage Designer

Introduction to Datastage Designer-Importance of Parallelism-Pipeline Parallelism-Partition Parallelism-Partitioning and collection-SMP(Symmetric Multi Processing-MPP(Massively Parallel Processing)-Topologies(Two tier, Three Tier)-Partition Techniques-Datastage Repository-Pallete-Passive and Active Stages-job design overview-Designer Work Area-annonations-Creating jobs,deleting jobs-Parameter passing-Compiling jobs-Batch compiling-Validating jobs-Importing flat file definitions-Managing the Metadata environment-Dataset management-Deletion of Dataset-Routines-Arguments passing to Routine Importing jobs-Exporting jobs(backup)-Node Confugaration-Generate Reports

Unit-8: Working with Datastage Jobs and Stages

Difference between server and parallel jobs-Overview of  Parallel jobs, Server jobs ,Mainframe jobs, migration jobs and job sequencing jobs-Repository, DS designer Tool bar and Palette-Active & Passive Stages- Palette Customization- About Link Markers-Framework Operators.

*Design, Compile & Run DS Jobs-
DS job Design Process-Designer Canvas Customization-Compile, Force Compile & multiple Job compile-DTD & OSH Code


*Database Stages-

Enterprise & Plug-in Stages Overview- Oracle-DB2-Teradata-ODBC-SQL Server Stage-Dynamic RDBMS Stage-Orchestrate Schema Import.

*File Stages-
Sequential file & Stage Rules-Data Set & Types(EE Format, DS files & versions)-CFF-Dataset-File Set-Lookup file set-Difference between Data Set, File Set & Sequential File Stages .

*Processing Stages-

Copy-Filter-Funnel-Sort-Remove,Duplicates-Aggregator-Modify-Compress-Expand-Decode-Encode-Switch-Pivot stage-Lookup-Join-Merge-difference between look up, join, merge, Funnel-Change capture-Change apply-Compare-Difference-Surrogate Key generator.

*Sorting & Vertical Combining-

In-Stage Sorts( Traditional Sort)- Sort Stage( Complex & Simple Sort)-Aggregate Stage-Remove Duplicate Stage.

*Data Transformation with Transformer-

Basic Transformer Vs. Parallel Transformer-External Functions & Macros-Stage Variables & System Variables-Transformer Constraints-Execution order-External Before & After Routines

*Filtering Stages-

Filter Stage-Switch Stage-External Filter Stage-Constraints & Source level

*Debug stages-

Head-Tail-Peek-Column Generator-Row Generator-Write range Map.

*Real Time Stages-

XML input-XML output-XML transformer-Column Export & Column Import.

*Local and Shared Containers

*Routine Creation

Unit-10: Advanced Stages in Parallel jobs(version 11.3)

Range Look process-Surrogate key generator stage-Slowly changing dimension stage-iway stage-SFTP stage-java plug in-Job performance analysis-Resource estimation-Local and Shared containers-Performance tuning.

Slowly Changing Dimensions

Types of Dimensions-Implementing SCD-I & II – SCD Stage(8.0.1)-Change Capture & Change Apply Stage-Difference Compare Stage-Surrogate Key Stage(State file & sequence object)

Unit-11: Job Sequencers

Arrange job activities in Sequencer-Triggers in Sequencer-Restability-Recoverability-Notification activity-Terminator activity-Wait for file activity-Start Look Activity-Execute Command activity-Nested Condition activity-Routine activity-Exception handling Activity-User variable activity-End loop Activity-Adding Check points.

Unit-12: IBM Information Server Administration Guide

IBM WebSphere Datastage administration-Opening the IBM information Server Web console-Setting up a project in the console-Customizing the project dashboard-Setting up security-Creating users in the Console-Assigning Security roles to users and groups-Managing licenses-Managing active sessions-Managing logs-Managing schedules-Backing up and restoring IBM Information Server.

Unit-13: Performance Tuning Tips

Performance Tuning with Best Practices (Complex jobs), Partitioning Techniques-Job Score-Performance Analysis & Estimate Resource.

Unit-14: Job Control

Job Sequencing (Run Stages, Error Handling Stages, Flow Control Stages & etc..)-Over view of JCL Scripting.

Advanced Topics

Parameter Set & Parameter file creation-Data Connection-Advanced Find-Containers