DATA SCIENCE
Instacart Data Science: Basket Size, Composition and Co-Occurrence Patterns

Shopping Basket “Splash” & Composition Analysis
Quick Summary
This project explores basket composition and “splash effects” in retail shopping data. The goal is to determine which products act as “anchors” — items that, when purchased, are associated with larger baskets or higher-value co-purchases. By identifying these anchors, marketing recruitment dollars can be allocated more efficiently toward customers or product categories that generate higher ROI through expanded basket depth.
“The secret of successful retailing is to give your customers what they want. And really, if you think about it from your point of view as a customer, you want everything: a wide assortment of good quality merchandise; the lowest possible prices; guaranteed satisfaction with what you buy; friendly, knowledgeable service; convenient hours; free parking; a pleasant shopping experience.” -Sam Walton
Methodology
1. Data Sourcing
- The dataset used is Instacart Market Basket Analysis from Kaggle, a Google-owned data science community with example datasets. The data sourced and utilized for this analysis is purely experimental and does not reflect real data.
- Transaction-level records include Transaction ID, Customer ID, Product ID/Category, Quantity, and Price.
2. Data Preparation
- Transform raw transactions into basket-level views (one row per transaction).
- Encode product presence using binary flags (1 = purchased, 0 = not purchased).
- Aggregate product sales frequencies to identify candidate anchors.
3. Anchor Selection
Anchors are chosen using a hybrid approach:
- Frequency-based: Top N most common items/categories.
- Value-based: High-revenue or high-margin products.
- Association-based: Items with strong co-occurrence patterns (high lift/support).
4. Basket Composition Analysis
For each anchor:
- Calculate average basket size (number of distinct items).
- Calculate average basket value (sum of prices).
- Compare across anchors to see which lead to larger or more valuable baskets.
5. Splash (Co-Occurrence) Analysis
- Build a co-occurrence matrix of anchor products vs. secondary products.
- Compute probability of co-purchase and lift ratios.
- Visualize with heatmaps or network graphs to show strongest splash effects.
Results
Download R ScriptTiming & Engagement
- Peak Day: Sunday (17.3% of orders)
- Lowest Day: Thursday (12.5% of orders)
- Peak Hour: 10 AM (8.5% of orders)
Recommendation: Schedule push notifications and promotions on Sunday mornings and reinforce mid-week traffic with Thursday specials.
Product & Assortment Strategy
Top Products | Order Count | Role |
---|---|---|
Banana | 472K | Basket driver |
Bag of Organic Bananas | 379K | Basket driver |
Organic Strawberries | 265K | Basket driver |
Organic Baby Spinach | 242K | Basket driver |
Organic Hass Avocado | 214K | Basket driver |
Concentration: Top 20 products drive 10.3% of line items; top 100 capture 23.1% of total volume.
Recommendation: Feature hero SKUs in search banners and guarantee availability — these products are basket drivers. Maintain depth in high-frequency SKUs.
Cross-Sell & Bundling
High-Lift Pairs (from top_pairs_by_lift.csv):
- Blueberries ↔ Regular Sliced Bacon (Lift ≈ 57)
- Blueberries ↔ Jamaican Allspice (Lift ≈ 57)
- Blueberries ↔ Orange Liqueur (Lift ≈ 57)
Anchor Recommendations:
- By Lift: Yellow Onions → Original Lager (Lift ≈ 44)
- By Confidence: Yellow Onions → Banana (29%), Large Lemon (14.7%), Organic Baby Spinach (13.8%)
Recommendation: Bundle fresh produce (bananas, spinach, lemons) with staple onions in recipe kits; explore discovery bundles (e.g., produce + beverages) based on high-lift but niche pairs.
Category & Department Insights
- Department Penetration: Produce (75%), Dairy & Eggs (68%), Beverages (45%)
- Cross-Department Synergy: Alcohol + Other (Lift ≈ 3.1), Alcohol + Pets (Lift ≈ 2.5)
- Aisle Hotspots: Fresh Fruits (56%), Fresh Vegetables (44%), Packaged Vegetables & Fruits (37%)
Recommendation: Promote Produce + Dairy pairings (e.g., fruit + yogurt). Leverage alcohol cross-sell potential with pet or “other” categories for curated lifestyle bundles.
Loyalty & Reorder Programs
Replenishment: With a 59% reorder rate and top SKUs (bananas, spinach, strawberries) recurring heavily, there’s strong scope for auto-replenishment.
Recommendation: Offer subscription boxes or reorder reminders for high-frequency fresh items; trigger nudges around the 7–10 day median cycle.
Implementation Roadmap
Quick Wins (0–1 month)
- Launch “Frequently Bought Together” widgets (Blueberries + Bacon, Onions + Bananas).
- Target push notifications for Sunday 10 AM.
Mid-Term (1–3 months)
- Homepage redesign: feature Produce & Dairy.
- Introduce produce–dairy recipe bundles and discovery promos around unusual high-lift pairs.
Long-Term (3–6 months)
- Subscription program for high-reorder fresh items.
- Category-level recommender leveraging co-occurrence matrix.
Visualizations

