Background
Our client is largest independent processor of flat-rolled steel and manufacturer of industrial cylinders and pressure vessels in the United States.  Â
Heavily customized BI stack (Informatica mappings, Data Warehouse, RPD Models, and Dashboards) hosted on Oracle Cloud. Cloudera Data lake on AWS to capture detailed operational data from machines, sensors, and shop floor applications to help improve production efficiency and quality.  Â
However, both the Oracle DW and Cloudera Data lake were unstable and unable to integrate with BI tools, hence there was a strong need to modernize the cloud data and analytics platform.
Current Challenges
The client faced several challenges with their existing data infrastructure:Â
- Stability of the Cloudera data lake – many outages and incidents
- Data redundancy across data warehouse, data lake, and tableau extracts
- Tableau connectivity issue to CDP Impala
- Tableau extracts required to hit query performance targets
- Unable to reliably hit data latency SLA’s
- Complexity of environment requires additional resources
- Lacking sufficient visibility over data pipeline health and data quality
Solution
To address these challenges, we proposed the following solution:Â
- Recommend and prove out the target state data and analytics architectureÂ
- Evaluate and shortlist technology and tools (Matillion, Talend, DBT, Streamsets, Airflow, Astronomer)
- Pilot two uses on the recommended architecture on Data Warehouse (ETL process) and Data Lake (streaming data) to meet success criteria including acceptable performance
- Help setup Infrastructure of (AWS, Snowflake, Streamsets, environment
- Extend the pilot implementation to migrate DW tables (54 Dim and 60 Fact) and Datalake artifacts (274 base tables and 73 views) for about 800 users
- Setup and automate CI / CD pipeline with Github actions and branches for Streamsets + Snowflake with scheduling and monitoring tools
The Results
- 30% performance improvement by converting Oracle Cloud and Cloudera to AWS-Snowflake stack.
- 20% SLA improvement by modernizing Oracle and Cloudera workloads on AWS-Snowflake.
- 50% cost and time savings when transforming Informatica workflows and Oracle EDW to AWS.
- Reduction in operational expenses, over $350k annual savings from consolidating Data Warehouse and Data Lakes.
Migrating data from on-premises servers to the AWS cloud significantly improved reporting capabilities. The unified and scalable platform facilitated self-service BI, reducing manual reporting efforts, enhancing data availability, and enabling data-driven decision-making.Â



