CUMTA Recruitment 2023 Junior Data Scientist Posts


2. Senior
Data Integration Engineer –

• Postgraduate in Computer Science, Information Technology, Big Data,
Data Science, Analytics, Information Systems or a related field is required.
Certification courses in Big-Data/Cloud platform/Data Science/ML will be an
added advantage.

• Minimum 5 years of experience in data management, integration, and
analysis, preferably in a large-scale, complex environment preferably in
transport sector.

• Strong understanding of data governance, data management principles,
data security, data privacy regulations, cybersecurity tools is mandatory.

• Experience with data integration tools and technologies, including
ETL tools, data warehouses, APIs, and cloud-based services is mandatory.

• Should have extensive experience in managing and integrating
large-scale datasets from diverse sources in the transportation sector.

• Should have good understanding of Event Driven Architecture
implementations.

• Should have architected integration solutions for cloud, hybrid, and
onpremises integration landscapes.

• Experience in developing and optimizing large scale data pipelines in
multi cloud environment and in on-prem environment.

• Advises project team during system development to assure compliance
with architectural principles, guidelines, and standards.

• Understand Configuration and Management, Platform Monitoring,
Performance Optimization Suggestions, Platform Extension, User Permissions
Control Skills.

• Should have hands on experience in configuring AS2, https, SFTP,
SOAP, DB involving different authentication methods.

• Analyze and prototype Data Source Connections (Data Integration)
ODBJ, JDBC, XML, WS.

• Experience with relational SQL and NoSQL databases, including SQL,
TSDB, Postgres and Cassandra, etc.

• Should have strong proficiency in writing/reviewing SQL scripts and
optimization of queries/sub-queries.

• Hands-on Experience with object-oriented/object function scripting
languages: Java, SQL, Scala, Spark-SQL, Pyspark and other related languages.
• Hands on experience with Hadoop and Bigdata tech. e.g. HIVE, HBase, Kafka,
Spark and other related technologies.

• Working knowledge of message queuing, stream processing and highly
scalable ‘big data’ data stores.

• Nice to have experience with data pipeline and workflow management
tools: Azkaban, Luigi, Airflow, etc.

• Demonstrated experience with GIS tools and FME or similar spatial ETL
tools. Developing knowledge of GIS portals and applications is desirable.

• Experience in working on modelling spatiotemporal systems such as
GIS, urban dataset, satellite images, and other sources is desirable.

• Should have experience in developing large scale real-time platforms
that are functioned with data collection, analysis and visualization,
workflow integration and closed loop systems.

• Strong project management and team leadership skills. Should be able
to explain complex data integration concepts to non-technical stakeholders

• Excellent communication and interpersonal skills, with the ability to
collaborate effectively with internal and external stakeholders.

• Knowledge of emerging technologies related to data integration, such
as AI, machine learning, and blockchain, would be an advantage.

• Certifications in data management, integration, and analytics, such
as Certified Data Management Professional (CDMP), Microsoft Certified: Azure
Data Engineer Associate, or IBM Certified Data Architect is an added
advantage.