Writing
My writing on Anomalo, Instacart and other topics.
Stepping Back
Announcing my decision to step back from an operating role at Anomalo due to health changes. Reflects on my career through the lens of proving a well known VC wrong - both about making Instacart profitable and about creating Anomalo.
Automating Data Quality (O’Reilly)
A practical guide to building and maintaining automated data quality monitoring systems that scale across cloud data platforms. It explains how to design and test unsupervised learning models for issue detection, implement effective alerting and resolution workflows, and manage these systems enterprise wide.
An All-In Founder Forum
Recounts the origin and structure of a peer forum of venture-backed founders focused on vulnerability, accountability, and personal growth. Shares operating norms, meeting cadence, and an open invite for a new member.
Build Data Factories, Not Data Warehouses
Modern organizations operate data factories (not warehouses) that transform raw inputs into dynamic, customized data products. To ensure these factories produce trustworthy outputs, data teams must invest in scalable, automated quality control systems that empower data consumers, minimize false alerts, and validate data at every stage, especially before delivery.
Detecting Extreme Data Events
Introduces Anomalo’s entity outlier check to catch rare, high-impact events by pairing time-series anomaly detection with automated root-cause analysis. Demonstrates the approach with NYC 311 data and contrasts it with generic outlier methods that fail in real-world settings.
Effective Data Monitoring: Steps to Minimize False Alerts
Offers ten actionable steps to reduce false positives and negatives in data monitoring systems and to calibrate alerting thresholds. It guides teams in balancing sensitivity and signal-to-noise to maintain trust in monitoring systems.
Trust Your Data with Unsupervised Data Monitoring
Presents how unsupervised learning can detect unexpected anomalies in data without requiring labeled incidents. It shows how combining forecasting with unsupervised signals helps improve coverage over simple rules-based monitoring.
Airbnb Quality Data For All
Airbnb’s growth demanded strong automated systems to keep its data complete, timely, and reliable. The article shows how companies can achieve similar data quality using tools like Anomalo without massive budgets.
Dynamic Data Testing
Defines a framework from static rules to dynamic, model-based tests and unsupervised detection for higher coverage with less maintenance. Uses EU COVID-19 data to show how predicted ranges outperform hand-tuned thresholds and avoid missed anomalies.
When Data Disappears
Explains why missing or partially missing data is the most common—and dangerous—data quality failure, often invisible in aggregate metrics. Outlines practical tests to detect staleness, shortfalls, and segment drop-offs, with alerting workflows for rapid triage.
700 Women Founders
Analyzes 2009–2013 Crunchbase data to estimate the share of venture-backed companies with women founders and ranks VC portfolios by representation. Argues for transparency and accountability by using data to surface diversity gaps in venture funding.
Space, Time and Groceries
Frames Instacart grocery delivery as a spatiotemporal logistics problem and describes the architecture used to optimize it. Uses Datashader to visualize massive Instacart GPS datasets that reveal how shoppers move through cities, stores, and delivery routes.
How Instacart Uses Data to Craft A Bespoke Comp Strategy
Details Instacart’s compensation methodology combining survey data and regression modeling, mapped to precise leveling to ensure fairness and competitiveness. Shares outcomes and tactics for market positioning, equity education, and calibration across roles and seniority.
3 Million Instacart Orders, Open Sourced
Instacart released an anonymized public dataset of over 3 million grocery orders from more than 200,000 users to support machine learning research on consumer purchasing behavior. The article introduces the dataset, highlights privacy protections, and shares some initial insights from the data.
Deep Learning with Emojis (not Math)
Offers an intuition-first explanation of deep learning concepts (using emojis) aimed at non-technical readers, connected to Instacart’s efforts in ranking and route optimization for shopping in store. It frames how deep learning ideas can be communicated without math-heavy exposition.
Doing Data Science Right — Your Most Common Questions Answered
Provides concise, high-leverage answers to recurring challenges in building and scaling data science teams, from scope setting to stakeholder alignment. It distills practical wisdom from leaders across many companies into a Q&A format.
Data Science at Instacart
How the data science organization at Instacart works in partnership with product and engineering to drive key decisions and outcomes. Outlines the data opportunities in forecasting, ads, recommendations, and search optimization.
How to Consistently Hire Remarkable Data Scientists
Describes a structured approach to recruiting, evaluating, calibrating, and retaining exceptional data science talent, with emphasis on projects that reflect real work. It highlights how to design take-homes and interviewing loops that predict long-term success.