Applied AI · San Francisco

Production AI for media discovery — recommendation, retrieval, and inference at scale.

Birthday Attack is an applied-AI startup designing and operating production machine learning workloads: hybrid recommendation models, embedding retrieval, and inference infrastructure tuned for real-time serving.

See our work → Get in touch

01 / Product

Anigraph — our flagship platform.

A production recommendation engine for anime, combining collaborative-filtering embeddings, content-based signals from a large review corpus, and a curated knowledge graph over creators, studios, and franchises.

In Production / anigraph.xyz

Anigraph

Hybrid recommender pairing an XSimGCL graph-neural model trained on hundreds of millions of user–title interactions with an XGBoost learning-to-rank reranker. NLP pipelines extract descriptive signals from the review corpus; an enrichment layer uses LLM inference for taxonomy and tagging. Served behind a low-latency Go backend with hot-reload embedding caches.

Visit site →

02 / Capabilities

Where we go deep.

We build and ship production AI systems end-to-end — from model training through inference serving and the surrounding data infrastructure.

— 01

Recommendation Models

Hybrid recommender systems combining graph neural networks, embedding retrieval, and learning-to-rank rerankers — tuned for catalogs where structure and curation matter.

— 02

Inference Infrastructure

Low-latency model serving: in-memory embedding indices, hot-reload pipelines, and serving stacks engineered for sub-100ms response times under production load.

— 03

LLM Enrichment

LLM-driven content enrichment: classification, structured extraction, taxonomy generation, and tagging pipelines that augment downstream recommendation and search.

03 / Approach

How we work.

A small team, deep ownership, and a bias toward shipping models that actually run in production.

Models that ship

Every model we train is built to be served — we measure offline metrics and online latency from day one.

Hybrid by default

Pure collaborative filtering rarely wins alone. We blend graph structure, content signals, and curation.

Latency is a feature

Inference is engineered, not assumed. We optimize end-to-end paths, not just model forward passes.

Own the stack

From data ingestion through the serving layer, we build systems we can reason about and operate.

04 / Contact

Let's talk.

Working on production AI? Looking to partner on a recommendation or retrieval system? We take on a small number of engagements each year.

Email [email protected]

Based San Francisco, CA

Hours Mon–Fri, 9–6 PT

Name

Company

Topic

Message

Replies within 1–2 business days.