Open Position at Hatch

Data Engineer

We are seeking a skilled and motivated Intermediate Data Engineer to join our growing team and play a critical role in building the data infrastructure necessary to train cutting-edge AI models. You will be responsible for designing, developing, and maintaining robust and scalable data pipelines on Google Cloud Platform (GCP) to ingest, process, transform, and prepare large datasets for machine learning. This is an exciting opportunity to be at the forefront of our AI initiatives and work with a talented team to drive innovation.

Location
Remote - Burnaby, Canada
Time
Full Time

Responsibilities:

  • Design, build, and maintain scalable and reliable data pipelines using GCP data engineering services (e.g., Dataflow, Cloud Composer, Cloud Functions, Pub/Sub).
  • Develop and implement data ingestion processes from various sources into GCP (e.g., Cloud Storage, BigQuery, Cloud SQL).
  • Implement data transformation and cleaning logic to ensure data quality and prepare it for AI model training.
  • Optimize data pipelines for performance, efficiency, and cost-effectiveness on GCP.
  • Build and maintain data warehouses and data lakes on GCP (e.g., BigQuery, Cloud Storage).
  • Develop data monitoring and alerting systems to ensure pipeline health and data integrity.
  • Collaborate closely with data scientists and machine learning engineers to understand their data requirements and provide efficient data solutions.
  • Implement data governance and data security best practices on GCP.
  • Write clear and concise technical documentation for data pipelines and processes.
  • Stay up-to-date with the latest advancements in data engineering and GCP technologies relevant to AI/ML.
  • Troubleshoot and resolve data pipeline issues.
  • Contribute to the development of analytics dashboards and reports to track data quality and pipeline performance, potentially using tools like Looker Studio.
  • Implement data versioning and reproducibility for machine learning experiments.

Qualifications:

  • Bachelor's degree in Computer Science, Data Science, Engineering, or a related quantitative field (or equivalent practical experience).
  • Minimum of 3-5 years of professional experience in data engineering.
  • Strong proficiency in SQL and experience working with large datasets.
  • Extensive hands-on experience with Google Cloud Platform (GCP) data engineering services, including:
    • Data Pipelines: Dataflow (Apache Beam), Cloud Composer (Apache Airflow), Cloud Functions, Pub/Sub.
    • Data Storage: BigQuery, Cloud Storage, Cloud SQL/Cloud Spanner.
    • Data Orchestration: Cloud Composer (Apache Airflow).
    • Data Transformation: Dataflow, BigQuery SQL.
  • Experience with at least one programming language commonly used in data engineering (e.g., Python, Scala, Java). Strong preference for Python.
  • Familiarity with data warehousing concepts and data modeling.
  • Experience with data quality monitoring and data governance principles.
  • Understanding of ETL/ELT processes and best practices.
  • Experience with version control systems, particularly Git.
  • Excellent problem-solving, analytical, and communication skills.
  • Ability to work independently and collaboratively in a fast-paced environment.  

Bonus Points:

  • Experience building data pipelines specifically for training machine learning models.
  • Familiarity with machine learning concepts and workflows.
  • Experience with feature engineering and data preparation techniques for ML.
  • Knowledge of data visualization tools (e.g., Looker Studio, Tableau, Power BI).
  • Experience with infrastructure-as-code (IaC) tools like Terraform or Deployment Manager on GCP.
  • Familiarity with data streaming technologies.
  • Experience with metadata management tools on GCP.

What We Offer:

  • Competitive salary (Range: $100K - $150K Annual) and benefits package.
  • Opportunity to work on cutting-edge AI projects and make a significant impact.
  • Collaborative and innovative work environment.
  • Flexible work arrangements (hybrid or remote options considered).

To Apply:

Interested candidates are invited to submit their resume and a cover letter detailing their relevant experience with GCP data engineering and their passion for working with data for AI model training to careers@hatchintelligence.com

Looking forward to hearing from you!

Let's Start the AI Journey

Ready to take the next step? We’re here to answer questions, discuss needs, and help unlock the full potential of AI.