About you:
You have a strong quantitative background and data engineering experience, a growth mindset, and strong work ethic.
- Strong grasp of mathematics and machine learning fundamental concepts
- Proficient in database programming, SQL, Python, and frameworks such Pytorch, Tensorflow, Keras, JAX
- Experience with data pipeline and workflow management tools (e.g. Apache Airflow, Luigi, Flyte, Snakemake, Nextflow, dbt) and data analysis
- You have a growth mindset, are curious and a fast learner
- You have good written, oral and visual communications skills in English
- Fluency with Unix environments, GCP/AWS, and GitHub
Ideal/Pluses:
- 3+ years of industry experience
- Experience with bioinformatics sequence analysis and alignment tools
- Experience working with next-generation sequencing data and structure data
- Strong background in Biology, Biochemistry, Bioinformatics, Structural Biology, Organic Chemistry, and/or Physical Chemistry.
About the role:
We are looking for candidates who are excited about the opportunity to join the founding team and play an expanding role in the company. Your responsibilities will include:
- Curate datasets for from the literature for training and validation new architectures to predict and design protein-protein interactions
- Build databases for scalable storage and fast retrieval of terabases of genomic data, including genomes, genes, proteins, and structures
- Create and deploy data pipelines in the cloud for extracting, processing, storing, and serving large-scale datasets.
- Clearly document code and results and communicate outcomes to colleagues
- Work closely with software & ML researchers to build systems for efficient training and deployment of deep learning models.
- In collaboration with our wet-lab, designing antibody structures and sequences for functional measurement in frequent design-build-test cycles