Data Engineer - Biological Sequences & Structure

Medium Biosciences

Date listed

2 months ago

Employment Type

Full time

Remote

Yes

Found on:

YCombinator Startups

Keywords: remote github python apache aws tensorflow ml gcp unix pytorch sql keras

About you:

You have a strong quantitative background and data engineering experience, a growth mindset, and strong work ethic.

Strong grasp of mathematics and machine learning fundamental concepts
Proficient in database programming, SQL, Python, and frameworks such Pytorch, Tensorflow, Keras, JAX
Experience with data pipeline and workflow management tools (e.g. Apache Airflow, Luigi, Flyte, Snakemake, Nextflow, dbt) and data analysis
You have a growth mindset, are curious and a fast learner
You have good written, oral and visual communications skills in English
Fluency with Unix environments, GCP/AWS, and GitHub

Ideal/Pluses:

3+ years of industry experience
Experience with bioinformatics sequence analysis and alignment tools
Experience working with next-generation sequencing data and structure data
Strong background in Biology, Biochemistry, Bioinformatics, Structural Biology, Organic Chemistry, and/or Physical Chemistry.

About the role:

We are looking for candidates who are excited about the opportunity to join the founding team and play an expanding role in the company. Your responsibilities will include:

Curate datasets for from the literature for training and validation new architectures to predict and design protein-protein interactions
Build databases for scalable storage and fast retrieval of terabases of genomic data, including genomes, genes, proteins, and structures
Create and deploy data pipelines in the cloud for extracting, processing, storing, and serving large-scale datasets.
Clearly document code and results and communicate outcomes to colleagues
Work closely with software & ML researchers to build systems for efficient training and deployment of deep learning models.
In collaboration with our wet-lab, designing antibody structures and sequences for functional measurement in frequent design-build-test cycles