Job Description
Role Summary:
As a Senior Data Engineer, you will design, develop, and maintain robust and scalable data pipelines and infrastructure on AWS Ubuntu. You will leverage your expertise in Python-based frameworks and a wide range of data platforms to build and optimize data warehouses, distributed query engines, ETL processes, and business intelligence solutions. You will be responsible for ensuring data quality, performance, and reliability while collaborating with cross-functional teams to deliver impactful data products.
Responsibilities:
- Data Pipeline Development: Design, develop, and implement efficient and scalable ETL/ELT pipelines using Apache NiFi, Airflow, and Airbyte.Efficient data processing via Pyspark,Apache Flink
- Data Warehousing: Build and maintain data warehouses using ClickHouse, StarRocks, and Apache Doris, ensuring data integrity and performance.
- Distributed Query Engines: Deploy, configure, and optimize distributed query engines like Trino, Drill, and Dremio for efficient data access and analysis.
- Business Intelligence: Develop and maintain dashboards and visualizations using Superset, Grafana, Metabase, and Redash.
- AWS Infrastructure Management: Deploy, configure, and manage data infrastructure on AWS Ubuntu, including EC2, S3, VPC, IAM, and CloudWatch.
- Performance Optimization: Tune and optimize data platforms and pipelines for high performance and scalability.
- Data Quality and Governance: Implement data quality checks and monitoring to ensure data accuracy and reliability.
- Troubleshooting and Support: Diagnose and resolve complex data infrastructure issues.
- Collaboration: Work closely with data scientists, analysts, and software engineers to deliver data-driven solutions.
- Documentation: Create and maintain comprehensive documentation for data pipelines and infrastructure.
- Mentorship: Mentor junior data engineers and share best practices.
- Stay Updated: Continuously research and evaluate new technologies and tools.
Requirements
Required Skills and Experience:
- 4-6 years of experience in data engineering roles.
- Strong proficiency in Python and related data manipulation libraries (Pandas, NumPy, Dask).
- Expertise in SQL and database technologies.
- Extensive experience with AWS Ubuntu infrastructure (EC2, S3, VPC, IAM, CloudWatch).
- Deep understanding and hands-on experience with any one in each category:
- Data Warehousing: ClickHouse, StarRocks, Apache Doris.
- Data Processing: Pyspark,Apache Flink.
- Distributed Query Engines: Trino, Drill, Dremio.
- ETL/Data Orchestration: Apache NiFi, Airflow, Airbyte.
- Business Intelligence: Superset, Grafana, Metabase, Redash.
- Experience with containerization (Docker) and orchestration (Kubernetes) is a plus.
- Strong understanding of data modelling, data warehousing, and data pipeline design principles.
- Excellent troubleshooting and problem-solving skills.
- Strong communication and collaboration skills.
- Ability to work independently and as part of a team.
Preferred Qualifications:
- AWS certifications (e.g., AWS Certified Data Analytics - Specialty).
- Experience with cloud-native data platforms.
- Contributions to open-source projects.
- Experience with data governance and security best practices.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
About Company / Benefits
SarvaGram leads India's fintech revolution, focusing on rural households. Our mission: empower rural communities via credit-led household-centric solutions blending high-tech innovation with a personal touch. We catalyze change, bridging financial gaps, and boosting agriculture productivity.