Staff Infrastructure & Performance Engineer

Nash

Date listed

1 week ago

Employment Type

Full time

Remote

Yes

Glassdoor Rating

4/5 (1 reviews)

Found on:

YCombinator Startups

Keywords: remote kubernetes ecs postgres terraform aws ai fargate

Staff Infrastructure & Performance Engineer

San Francisco / Remote (US) · $180K – $300K + equity

About Nash

Logistics is the substrate beneath every economy that has ever existed, and it remains the least intelligently coordinated activity in the modern world. Consumer expectations are converging on instantaneous, perfect, free. Networks are not. We call this gap the Logistics Singularity, and closing it is the work.

Nash is the Autonomic Logistics OS. We unify decisioning, execution, and capacity into a single programmable system that pursues a business’s objectives continuously, adapts as conditions change, and runs the operation at equilibrium across orders, fleets, carriers, providers, and customers. The world’s largest retailers, grocers, and pharmacies (including Walmart, 7-Eleven, Woolworths, and Coles) run their critical workflows on Nash.

Founded by Mahmoud Ghulman and Aziz Alghunaim, and backed by Y Combinator, a16z, and other top investors. We are based in San Francisco.

About the Role

We are looking for a Staff Infrastructure & Performance Engineer to own and evolve the performance, reliability, and scalability of the systems Nash runs on. You will work directly with engineering leadership and across platform and product engineering, designing and operating the low-latency, business-critical infrastructure that powers real-time logistics for some of the world’s largest retailers.

This is a senior, high-impact role. You will set technical direction on elastic capacity, multi-region availability, Postgres performance, cloud-native architecture, and enterprise-grade CI/CD. The autonomic system Nash provides is only as fast, as resilient, and as predictable as the infrastructure underneath it. Every decision, every dispatch, every promise to a customer stands on the floor you hold.

What You’ll Do

Own infrastructure performance and reliability across Nash’s production systems, with a focus on low latency, high throughput, and predictable behavior under load.
Design, build, and optimize AWS-based infrastructure, with a strong emphasis on managed services and ECS/Fargate.
Lead Postgres performance engineering: query optimization, indexing strategies, connection management, replication, cluster design, and failover.
Architect and operate multi-region, highly available systems with strong resiliency, disaster recovery, and failover guarantees.
Design and evolve enterprise-grade CI/CD pipelines that support safe, repeatable, fast deployments across environments and regions.
Drive observability standards (metrics, logs, tracing, SLOs) and use them to proactively eliminate performance bottlenecks before customers feel them.
Partner with application engineers to influence system design decisions that affect scalability, latency, and reliability.
Lead incident response and postmortems with a focus on root cause, systemic fixes, and long-term resilience.
Set infrastructure and performance best practices and mentor engineers across the org.

What You’ll Bring

6+ years building and operating high-scale, production infrastructure for business-critical systems.
Deep expertise in AWS, including networking, compute, storage, and managed services.
Hands-on experience running production workloads on ECS/Fargate at scale.
Strong Postgres background: performance tuning, replication, high availability, and operational excellence.
Proven experience designing and operating multi-region architectures with strict uptime and reliability requirements.
Strong understanding of CI/CD for enterprise deployments, including rollout strategies, environment isolation, and rollback safety.
Experience building low-latency systems where milliseconds matter.
Excellent debugging and systems-level problem-solving skills.
Ability to operate autonomously and lead technical initiatives in a fast-paced startup environment.
US-based. This role requires US work authorization (citizen or visa holder).

Bonus

Fluency with Terraform.
Kubernetes experience at production scale.
Background running infrastructure for real-time, integration-heavy, or AI-driven systems.
Prior staff or principal-level technical leadership at a high-growth startup.

Why This Role Matters

Nash is becoming the operating system that the world’s largest retailers, grocers, and pharmacies will run their logistics on. The infrastructure you build is the floor under all of it: every decision the autonomic system makes, every promise a merchant keeps, every delivery a customer is waiting for, depends on the latency, reliability, and predictability of what you operate. When the system holds, the world’s commerce keeps moving.

If you want to lead infrastructure at the scale where milliseconds and nines compound into real-world outcomes, and to set the technical bar that the rest of engineering builds against, this is the role.

What You’ll Love About Us

Early-stage, well-funded company with real revenue and global enterprise customers.
Massive ownership and direct collaboration with engineering leadership and the founders.
Quarterly team on-sites to bond and align in person.
Competitive compensation and meaningful equity.
Flexible paid time off.
Health, dental, and vision insurance.

EEOC

At Nash, we believe diverse teams are the strongest teams. We invite applicants of all genders, races, ethnicities, nationalities, ages, religions, sexual orientations, disability statuses, educational experiences, family situations, and socio-economic backgrounds.