Job Description: Data Engineer

Location: Bangalore, India
Experience: 2-3Years

About Quation Solution:

In today’s data-driven world, information is power. But raw data alone can’t fuel success. At Quation, we bridge the gap. We’re a data analytics consultancy with a purpose to empower brands like yours to unlock the true potential of their data.

We translate complex data landscapes into clear, actionable insights that inform strategic decision-making across your entire organization. We leverage cutting-edge technology and a deep understanding of your industry to ensure our solutions are tailored to your unique challenges and opportunities.

If you are a passionate and skilled Data Scientist looking for an exciting opportunity, we would love to hear from you. Apply now to join our team in Bangalore!

Feel free to customize any part of this description to better fit your needs.

In today’s data-driven world, information is power. But raw data alone can’t fuel success. At Quation, we bridge the gap. We’re a data analytics consultancy with a purpose to empower brands like yours to unlock the true potential of their data.

We translate complex data landscapes into clear, actionable insights that inform strategic decision-making across your entire organization. We leverage cutting-edge technology and a deep understanding of your industry to ensure our solutions are tailored to your unique challenges and opportunities ChatGPT

Job Overview

We are seeking a highly skilled Data Engineer with expertise in AWS and PySpark to design, develop, and maintain data pipelines and infrastructure for large-scale data processing. The ideal candidate will have a strong background in distributed systems, ETL processes, and cloud-based data engineering.

Key Responsibilities:

  1. Data Pipeline Development:
    • Design, implement, and optimize robust data pipelines using PySpark for batch and streaming data processing.
    • Ensure data pipelines are scalable, maintainable, and efficient.
  2. Cloud Infrastructure:
    • Build and manage cloud-based data solutions on AWS services such as S3, Glue, Lambda, Athena, EMR, and Redshift.
    • Monitor and optimize cloud resource usage to ensure cost efficiency.
  3. Data Integration and Management:
    • Integrate data from multiple sources into a unified data warehouse.
    • Maintain metadata management and data quality standards across all pipelines.
  4. Collaboration and Documentation:
    • Collaborate with data scientists, analysts, and business stakeholders to gather requirements.
    • Document workflows, processes, and best practices.
  5. Performance Tuning:
    • Optimize the performance of ETL jobs and distributed processing workflows in PySpark.
    • Troubleshoot and resolve issues related to data processing and storage.

Required Skills and Qualifications:

  • Technical Expertise:
    • Proficient in PySpark and distributed data processing frameworks.
    • Hands-on experience with AWS services such as S3, Glue, EMR, Lambda, and Redshift.
    • Strong SQL skills for querying and data transformation.
    • Familiarity with orchestration tools like Apache Airflow or AWS Step Functions.
  • Programming Skills:
    • Strong Python programming experience, particularly in data engineering and ETL development.
    • Experience with version control tools like Git.
  • Cloud and Infrastructure:
    • Experience in managing and optimizing cloud data solutions, with a focus on cost and performance.
    • Understanding of infrastructure-as-code (IaC) tools like Terraform or CloudFormation.
  • Problem Solving:
    • Strong analytical and troubleshooting skills.
    • Ability to work with large datasets and ensure high performance.
  • Experience:
    • 3–6 years of professional experience in data engineering.
    • Prior experience in data modeling, data warehousing, and handling structured and unstructured data.

Preferred Qualifications:

  • AWS certifications like AWS Certified Data Analytics – Specialty or AWS Certified Solutions Architect.
  • Experience with real-time data streaming technologies such as Kafka or Kinesis.
  • Knowledge of DevOps practices and CI/CD pipelines.
  • Exposure to machine learning workflows or data science projects is a plus.

Key Attributes:

  • A self-starter with a passion for solving data challenges.
  • Excellent communication and teamwork skills.
  • Strong attention to detail and focus on data accuracy.