
Job Summary:
We are seeking a highly skilled Big Data Engineer with 8 years of experience in data migration, data setup, and data systems development. The ideal candidate will have deep expertise in Apache Spark, SQL, and Java (with Scala) for large-scale data processing, reporting, and system development. Strong knowledge of data architecture, semantic layer development, and experience in regression testing and cutover activities for enterprise-level migrations is essential.
Key Responsibilities:
Spark:
- Design, develop, and optimize Spark-based ETL pipelines for large-scale data processing and analytics.
- Utilize Spark SQL, DataFrames, RDDs, and Streaming for efficient data transformations.
- Tune Spark jobs for performance, including memory management, partitioning, and execution plans.
- Implement real-time and batch data processing using Spark Streaming or Structured Streaming.
SQL:
- Write and optimize complex SQL queries for data extraction, transformation, and aggregation.
- Perform query performance tuning, indexing, and partitioning for efficient execution.
- Develop stored procedures, functions, and views to support data operations.
- Ensure data consistency, integrity, and security across relational databases.
Java (Preferred with Scala Knowledge):
- Develop backend services and data processing applications using Java and Scala.
- Optimize JVM performance, including memory management and garbage collection, for Spark workloads.
- Leverage Scala’s functional programming capabilities for efficient data transformations.
- Implement multithreading, concurrency, and parallel processing in Java for high-performance systems.
Required Skills & Qualifications:
- 8+ years of experience in data engineering, with a focus on big data technologies.
- Strong proficiency in Apache Spark, SQL, and Java/Scala.
- Experience in data migration, data setup, and semantic layer development.
- Solid understanding of data architecture, ETL frameworks, and data governance.
- Hands-on experience with regression testing and cutover planning in large-scale data migrations.
- Familiarity with cloud platforms (e.g., AWS, Azure, GCP) is a plus.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration abilities.
Preferred Qualifications:
- Experience with Hadoop ecosystem tools (Hive, HDFS, Oozie, etc.).
- Knowledge of containerization and orchestration (Docker, Kubernetes).
- Exposure to CI/CD pipelines and DevOps practices.
- Relevant certifications in Big Data or Cloud technologies.