Platform Engineer

Triomics

Date listed

2 months ago

Employment Type

Full time

Found on:

YCombinator Startups

Keywords: node aws agents ai azure kubernetes docker ml

About Triomics

Triomics is building the agentic AI layer for oncology EHRs. Cancer hospitals spend billions on highly trained staff manually reading unstructured patient records - pathology reports, clinical notes, genomic panels - to power workflows like trial matching, registry curation, visit prep, and quality reporting. We replace that manual work with task-driven AI agents that sit inside the EMR and process records at scale, in real time.

Our platform is trusted by the 4 of the top 10 Best Hospitals for Cancer by U.S.News and several of the largest community practices. We have grown 10x in the last year and process millions of oncology medical documents monthly.

Our investors include Lightspeed, General Catalyst, Nexus Venture Partners and Y-Combinator.

Role

This role spans backend product engineering and infrastructure. You'll build backend services and application features, and also own the cloud infrastructure, deployments, and CI/CD that keeps them running in production. The platform processes millions of clinical documents monthly across multi-tenant deployments in customer as well as Triomics cloud environments, with GPU infrastructure serving AI extraction models. We need someone who can write application code in the morning and debug a Kubernetes deployment issue in the afternoon.

What Success Looks Like in the First 90 Days

Days 1-30: Map the entire infrastructure and find what's fragile.

Get access to every deployment - AWS, Azure, customer-hosted environments. Understand the full topology: how Kubernetes clusters are configured, how GPU nodes serve models, how document pipelines move data from EHR ingestion to extraction to structured output. Your first job is to understand what is already built, where the sharp edges are, and what breaks when load spikes or a deployment goes sideways. By end of month one, you should have a written map of every production environment, know which deployments are most fragile, and have identified the top 3 infrastructure risks.

Days 30-60: Own production stability and start shipping backend services.

Take ownership of at least one customer deployment end-to-end - monitoring, alerting, incident response. Set up observability that catches pipeline failures and data quality regressions before customers report them (today, customers often find issues first). Simultaneously, pick up a backend product feature - patient data processing, document pipeline improvement, or a platform feature the product team needs. Ship it. The goal is to make sure you can context-switch between infra firefighting and product engineering.

Days 60-90: Standardize deployments and Monitor Everything.

Document deployment runbooks, automate what's manual, and build CI/CD improvements that make releases safer and faster. You should have a clear plan for what the infrastructure needs to look like to support 2-3x the current customer count without adding headcount proportionally.

Responsibilities

Build and ship infrastructure services that power our product - document pipelines, application logic, and platform features
Own cloud infrastructure and deployment pipelines across both Triomics and customer environments (AWS, Azure)
Manage Kubernetes clusters, containerized services, CI/CD, and release processes including GPU node management for model serving
Build monitoring, alerting, and observability across production deployments - we process millions of documents and need to catch pipeline failures, data quality regressions, and infrastructure issues before customers do
Debug and resolve production issues end-to-end - from application-layer bugs to infrastructure failures
A significant portion of our engineering team is offshore and this role requires working with that team as well on architecture decisions, code reviews, and production stability

Requirements

3+ years as a platform/infrastructure engineer at a startup or growth-stage company
Strong backend engineering: can design, build, and ship production services
Comfortable across the infrastructure stack: cloud (AWS or Azure), Kubernetes, Docker, CI/CD, networking, monitoring
Experience managing production deployments and debugging issues across application and infrastructure layers.
Can context-switch between writing product code and doing infra/ops work without treating either as out of scope of their job

Preferred

Experience with data-heavy applications - document processing pipelines, batch and real-time data workflows
Worked with ML/AI systems in production - model serving, GPU infrastructure, pipeline orchestration
Built infrastructure at an early-stage company where you were one of few engineers owning the full stack
Familiarity with building third party integrations in product is a plus