Data Lake Solution Helps Enhance Marine Safety & Reduce Manual Efforts by 100%

The customer is one of the global leading metals and mining corporations, with their major products including aluminium, diamonds, copper, gold, industrial minerals, etc. They were looking for skilled resources for their project to ingest 7 varied sources with different formats from 7 different vendors and consolidate and standardize the data in a single database. This would further help them in enhancing their marine safety dashboard.

The Need for Cloud-based Data Lake

The customer has been looking for a partner to assist with their project to build an AWS infrastructure as well as a custom code to ingest and cleanse these data sources and save them into a relational database. The DataLake on Cloud Infrastructure is meant to consolidate the data gathered from multiple unformatted sources in the form of complex Excel Spreadsheets and APIs (Structured and semi-structured)

This consolidated data must then be organized into a user-accessible database, thereby updating its Marine Safety Dashboard automatically. The customer’s Marine Safety data was previously emailed/collected by their Health and Safety Manager, Marine Technical Ops Manager, and Voyage Operations Manager. The data sets include CRM reports, FAI injuries & incidents, recordable injuries, open action & internal ship inspection tracker, inspections, ME incidents, restricted vessels, etc.

Each of these data sets was being sent in text, pdf, or excel format every month to the respective manager. The data was then collated by individual teams into individual ungoverned spreadsheets for conducting the required reporting & analysis. Top challenges to the development of the infrastructure included:

Restrictions in the deployment environment (i.e. interfacing with vendor API’s or portals)
Lack of negotiation power to influence vendors to provide data in the desired format at the desired frequency
Requirements to change internal data collection/entry process
Lack of pooled resources to build the product

The Solution

The customer engaged with Blazeclan, a Premier Consulting Partner of AWS, to help them with their requirements and evade the challenges in building a consolidated data infrastructure. This is intended to offer the customer a solution to manual consolidation and governance of data, data visibility across teams, and automated reporting. Also, the solution will increase the frequency of updates, enabling Analysts and Managers of the customer to remain focused on data analysis and implementation of preventative safety measures.

The Approach to the Solution Involved

Setting-Up Domain Address: This is to receive the files for processing and is carried out by the customer. The customer registered a new domain in the AWS SES section and configured an email address to receive the files.

Capturing the attachment from the email: The email address registered in SES is configured to store the emails into a particular folder in an S3 bucket. Whenever a new email arrives, the push to the S3 bucket invokes a Lambda function that is responsible for extracting the attachment that exists in the email. Every Attachment is then verified and placed into an individual source folder, according to the naming convention provided by the customer. This stage is called the Raw stage.

Cleansing the source files: Every push action in the raw stage invokes individual source lambda functions that are responsible for converting the data set and files in a common format which is easier for processing. This also includes the normalisation of the data such as common date formats for all the sources. This stage is called the Cleansing stage.

Ingestion of the cleansed files: In this process, the files from the cleansing stage are converted or modified according to the agreed data model essential for providing the holistic picture of the dataset from all the sources. This includes identifying the key columns which are required to join with other sources. It is in this stage which is called the Ingestion stage, the data is pushed into a database or data warehouse.

The implemented solution will allow the customer to send all the required email attachments to a single email address. This triggers a series of automated processing pipelines to process the files according to the data model and push it into the data warehouse. The data is consumed in a form of report which highlights the port and vessel incident statistics.

Benefits Achieved by the Customer

High Scalability: The data lake solution provided the customer with the ability to scale their infrastructure according to the changing business demands. The solution further enabled a new auto-scaling feature with minimum or zero downtime.
Automation: Automated data collection, storage and management enabled the customer’s application teams to scale their deployment speed. Automation and data ingestion enables the customer to have a consistently updated view of business performance and customer behaviour.
Analytics-Driven Business decisions: Consolidation of all data sources in a single platform enabled the customer in making efficient decisions.

Tech Stack

Amazon Lambda	AWS SES	Amazon EC2
Amazon RDS	Amazon SNS	Amazon S3
PostgresSQL

Categories

Big Data & Analytics, Datalake
Service Tags

automation, Big Data, Cloud, Data Analytics, Data Lake, North America

Cloud Consulting, Strategy, and Migration

Application Assessment

Cloud Migration

DevSecOps

Platform Modernization

Cloud Optimization & Modernization

Application Modernization

Cloud Native Application Development & Testing

Conversational AI

SaaS Product Development​

Application Maintenance & Support

Cloud Managed Services

Cloud Security Operations

Cloud Security Engineering

Cloud Consulting, Strategy, and Migration

DevSecOps

Cloud Security Engineering

Application Assessment

Cloud Native Application Development & Testing

SaaS Product & Platform Development

Data Strategy

Data Governance and Engineering

Advanced Analytics

Cloud Governance & Reporting

Cloud Discovery & Optimization

DevOps Transformation (DoT)

cAssure

cSecure

SaaS Factory Model

BlazePulse

cSaver

Cloud and Platform Modernization

Cloud Security Operations

Conversational AI

Application Maintenance & Enhancement

Application Modernization

Managed Analytics

BI Modernization

Cloud Managed Services

CSPM

AIOPS

FinOps

cAssure

Data Lake In A Box

DevOps Transformation (DoT)

cSecure

SaaS Factory Model

Blogs

Success Story

Ebooks & Whitepapers

Manufacturing in the AI Era, Malaysia

AWS Cloud Day, Malaysia Edition

Optimizing Cloud Costs: Strategies for Maximizing ROI

Future of Cloud Computing: Strategies for 2025 and Beyond

Financial Services

Banking & Insurance

Media & Entertainment

Telecom

Technology

iGaming

ITC Infotech

About Us

Our Leadership

Customer Speaks

Strategic Partners

OneClan Life

Clouditects

Work With Us

Thought Leadership

Awards and Recognition

Media Coverage

Cloud Consulting, Strategy, and Migration

Application Assessment

Cloud Migration

DevSecOps

Platform Modernization

Cloud Optimization & Modernization

Application Modernization

Cloud Native Application Development & Testing

Conversational AI

SaaS Product Development

SaaS Product Development