For Business Intelligence (BI) market is very much dependent on ETL architecture. The Extract, Transform and Loading products have become far more important in the data driven age. DataStage is one of the most important ETL tools which effectively integrate data across various systems. DataStage designs jobs that manage the collection, transformation, validation and loading of data from different systems to data warehouses. DataStage facilitates business analysis through its user friendly interface and providing quality data to help in gaining business intelligence. With IBM acquiring DataStage in 2005, it was renamed to IBM WebSphere DataStage and later to IBM InfoSphere.
DataStage features and usage via web interface
DataStage provides some high end features and benefits as listed below:
DataStage is a scalable ETL platform which facilitates collection, integration and transformation of large volume of data with data structures ranging from simple to complex.
- Big Data & Hadoop support:
It enables users to directly access big data on a distributed file system. It provides JSON support and a new JDBC connector which helps clients in taking leverage of new data sources.
- Real time data integration:
It not only provides a near real time integration of data but also supports connectivity between data sources and applications.
- Work load management:
It helps users in prioritizing mission critical task and helps in optimizing hardware utilization
- Ease of use:
It is simplifies managing data integration infrastructure by improving speed, flexibility and effectiveness to build, deploy and update.
- Security Controls:
It allows researchers to have a private area which is only accessible to them and the group leader. There can also be shared and collaborative areas for files to be accessed by whole research group.
Services and Study Materials
- Study materials on Datastage & Interview Questions will be provided.
- Will provide important interview questions and precise answers while discussing corresponding topics in class.
- Regular Assignments in the Classes.
- Real Time Scenarios to be discussed in Class.
- Extra Scenarios will be provided for practice.
Data Stage Course Contents
We do not have lot of PL/SQL, but more on SQL scripting.
- What is Unix requirement in Informatica Projects
- Unix commands
- Basic shell scripts
This will cover most of the points in DS8.5
- Introduction to EE Architecture
- Introduction about DS designer,Manager,Director,Administrator)
- Creating parallel jobs using (Change capture,Lookup,Join,Merge,funnel, Remove duplicates,Sort,Modify,Copy,DRS,ORACLE,BASIC/Parallel Transformer)
- Accessing sequential data (Reading flat files)
- EE data types
- Partitioning data
- Combining data
- Sorting and aggregating data
- Transforming data
- Best practices and job design guidelines
- Database usage
- Environment variables
- Performance tuning
- Standards and techniques
- Accessing relational data
- Compilation and execution
- Testing and debugging
- Metadata in EE
- Job control