Blazeclan deployed AWS data lake to provide big data analytics and establish a Single Customer View (SCV) to personalize future customer interactions.
Customer Profile
The Group Company is a leading content and consumer group in Malaysia and Southeast Asia with a focus on the pillars of the watch, listen, read and shop. Its an offering of TV channels is delivered via Direct-To-Home satellite TV, IPTV and OTT platforms.
Fulfilling its promise to bridge the digital divide for all of Malaysia, the company has an entry-level DTH satellite TV service which is the country’s first non-subscription based satellite TV, offering both TV and radio channels.
The Challenge
Business users of the group company wanted a better way to capture and analyse data generated from approximately 121 data sources. They were looking for a platform that could quickly ingest data across its subsidiaries, provide enough compute power for real-time analytics, produce results in seconds, and then be shut down when not in use.
They were looking to establish a Single Customer View (SCV) on its customers so as to analyse past behaviour in order to better target and personalize future customer interactions. In order to achieve this business objective, there is a need for a robust data management platform that was strong enough to deduce insights from structured and unstructured data.
The company also wanted to leverage the potential of Amazon Web Services (AWS) and its benefits like managed infrastructure, application services, scalability and fault tolerance.
The Solution
Having attained AWS Big Data competency status, Blazeclan created a focused analytics group who ensured successful deployment of Data Lake in development and production environments.
Blazeclan’s AWS certified solution architect helped the customer to create a data lake and worked towards implementation of the Hadoop Data Lake Platform. The data lake was hosted in Amazon S3. By leveraging Apache Spark on EMR, the data was pushed into the visualization tools like Athena and Tableau for the end users. The team designed the architecture while keeping the following objectives in mind-
Platform
- To established data model, governance process and documentation (metadata master data, data lineage and data dictionary)
- To establish data recovery capability with cloud infrastructure
Agility & Completeness in Reporting and Analytics
- To ingest a variety of data sources to have a Single Customer View
- To demand Performance SLA’s in getting Reports
- To provide self-service ad-hoc analysis to multiple users.
- To ensure Near real-time capability
Robust Analytical Platform capabilities
- To create a platform that accumulates organizes and analyzes large volumes of structured and unstructured data to enable users to analyze data from various sources
- To create a platform that supports integration with 3rd party tools for analysis
Lastly, the Hadoop data lake platform together with Vertica (current EDW) would form a robust hybrid data management platform in order to support analytics.
Outcomes
- Fast insights: Data lake platform can support diverse data source ingestion for better data correlation, the analytical model building which will drive actionable insights.
- Simplified data analysis: The customer team now spends less time managing the data pipeline and more time focusing on analytics.
- Early identification of patterns: By leveraging Data Lake, the customer is able to identify patterns such as transaction or viewership distribution.
- Single Customer View: Data Lake allowed strengthening the practice by improving a single customer view.
AWS Services used
Blazeclan availed a number of AWS services to execute this project successfully.
- Amazon EC2 was used for computing capacity management for their application deployment. It helped in reducing the time required to spin up new server instances to minutes, allowing them to quickly scale capacity, both up and down, as per their requirement.
- Amazon Redshift service was used to manage all of the work of setting up, operating, and scaling a data warehouse.
- Amazon S3 was used to store and retrieve any amount of data from anywhere and everywhere.
- Amazon DynamoDB was used for all applications that required consistent, single-digit millisecond latency at any scale.
- Amazon EMR was used to manage the cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.
- AWS CloudTrail enabled governance, compliance, operational auditing, and risk auditing of the AWS account.
Whether you are just getting started or are exploring to advance in your cloud journey and need assistance in cloud consulting, cloud-native application development, big data analytics, DevOps and automated managed services; please write to us at sales@139.59.4.14.