Big Data Engineer

Singapore, Singapore

Overview  

We are seeking a highly skilled Big Data Engineer with strong experience in Lakehouse architecture and distributed data processing to design and build scalable data platforms. The role focuses on developing end-to-end data pipelines for both batch and real-time processing using modern big data technologies such as Spark, Kafka, and Hadoop. The ideal candidate will have hands-on experience implementing Medallion (Bronze/Silver/Gold) architecture and optimizing large-scale data workloads while ensuring data quality, governance, and performance.

Key Responsibilities

  • Design, develop, and optimize end-to-end data pipelines using Lakehouse architecture (Bronze/Silver/Gold layers).
  • Build and maintain ETL/ELT pipelines using Apache Spark (PySpark/Scala) for large-scale data processing.
  • Develop and manage real-time streaming solutions using Kafka, Spark Structured Streaming, or Flink.
  • Build and operate Kafka producers and consumers, ensuring reliable data delivery (exactly-once/at-least-once semantics).
  • Work with Hadoop ecosystem components (HDFS, YARN, Hive, Airflow/Oozie, HBase/NoSQL).
  • Implement and manage Lakehouse table formats such as Delta Lake, Apache Iceberg, or Apache Hudi.
  • Handle schema evolution, data versioning, ACID transactions, and time-travel capabilities.
  • Optimize data storage and performance through partitioning, clustering, Z-ordering, compaction, and file size tuning.
  • Implement data quality checks, observability, and lineage frameworks (e.g., Great Expectations, Deequ).
  • Tune and optimize performance across Spark, Kafka, and distributed systems.
  • Troubleshoot data pipeline issues including failures, data skew, and streaming backpressure.
  • Deploy and manage data pipelines using CI/CD tools, containerization (Docker), and orchestration tools (Airflow/Argo/Oozie).
  • Collaborate with data architects and stakeholders to ensure data governance, security, and best practices.
  • Contribute to code quality, documentation, and continuous improvement initiatives.

Job Qualifications and Requirements

  • Strong programming experience in Python (PySpark) or Scala.
  • Hands-on experience with Apache Spark (RDD, DataFrame, Structured Streaming) in production environments.
  • Solid experience with Apache Kafka (topics, partitions, offsets, consumer groups).
  • Good understanding of Hadoop ecosystem (HDFS, YARN, Hive/HCatalog).
  • Hands-on experience with Lakehouse platforms such as Delta Lake, Apache Iceberg, or Apache Hudi.
  • Strong knowledge of Medallion Architecture (Bronze/Silver/Gold) and Lakehouse data modeling.
  • Understanding of ACID transactions, schema evolution, and time-travel in data platforms.
  • Experience with data formats such as Parquet, Avro, ORC and compression techniques.
  • Strong SQL skills (SparkSQL, Hive, Presto, or Trino).
  • Experience with CI/CD pipelines, Git, and orchestration tools (Airflow/Oozie).
  • Knowledge of performance tuning for Spark and Kafka.
  • Familiarity with monitoring tools (Prometheus, Grafana, Datadog, etc.).
  • Experience implementing data quality frameworks and data lineage solutions.
  • Strong analytical, troubleshooting, and problem-solving skills.

Need more details?
For further inquiries, kindly send an email to dfe@geco.asia

Big Data Engineer

Job description

Big Data Engineer

Personal information
Details