DATA SCIENCE

Instacart Data Science: Basket Size, Composition and Co-Occurrence Patterns

By Olivia Arkema
Shopping Basket Composition Analysis (Instacart Dataset)
Image by Alexa from Pixabay

Shopping Basket “Splash” & Composition Analysis


Quick Summary

This project explores basket composition and “splash effects” in retail shopping data. The goal is to determine which products act as “anchors” — items that, when purchased, are associated with larger baskets or higher-value co-purchases. By identifying these anchors, marketing recruitment dollars can be allocated more efficiently toward customers or product categories that generate higher ROI through expanded basket depth.



“The secret of successful retailing is to give your customers what they want. And really, if you think about it from your point of view as a customer, you want everything: a wide assortment of good quality merchandise; the lowest possible prices; guaranteed satisfaction with what you buy; friendly, knowledgeable service; convenient hours; free parking; a pleasant shopping experience.” -Sam Walton

Methodology

1. Data Sourcing

  • The dataset used is Instacart Market Basket Analysis from Kaggle, a Google-owned data science community with example datasets. The data sourced and utilized for this analysis is purely experimental and does not reflect real data.
  • Transaction-level records include Transaction ID, Customer ID, Product ID/Category, Quantity, and Price.

2. Data Preparation

  • Transform raw transactions into basket-level views (one row per transaction).
  • Encode product presence using binary flags (1 = purchased, 0 = not purchased).
  • Aggregate product sales frequencies to identify candidate anchors.

3. Anchor Selection

Anchors are chosen using a hybrid approach:

  • Frequency-based: Top N most common items/categories.
  • Value-based: High-revenue or high-margin products.
  • Association-based: Items with strong co-occurrence patterns (high lift/support).

4. Basket Composition Analysis

For each anchor:

  • Calculate average basket size (number of distinct items).
  • Calculate average basket value (sum of prices).
  • Compare across anchors to see which lead to larger or more valuable baskets.

5. Splash (Co-Occurrence) Analysis

  • Build a co-occurrence matrix of anchor products vs. secondary products.
  • Compute probability of co-purchase and lift ratios.
  • Visualize with heatmaps or network graphs to show strongest splash effects.



Scale
3.2M
Prior orders analyzed
Users / Products
206K / 49.7K
Unique users / products
Avg Basket Size
10.1
Median 8; 4.9% single-item
Reorder Behavior
59%
Median cycle 7 days (avg 10.7)

Timing & Engagement

  • Peak Day: Sunday (17.3% of orders)
  • Lowest Day: Thursday (12.5% of orders)
  • Peak Hour: 10 AM (8.5% of orders)

Recommendation: Schedule push notifications and promotions on Sunday mornings and reinforce mid-week traffic with Thursday specials.

Product & Assortment Strategy


Top Products Order Count Role
Banana472KBasket driver
Bag of Organic Bananas379KBasket driver
Organic Strawberries265KBasket driver
Organic Baby Spinach242KBasket driver
Organic Hass Avocado214KBasket driver

Concentration: Top 20 products drive 10.3% of line items; top 100 capture 23.1% of total volume.

Recommendation: Feature hero SKUs in search banners and guarantee availability — these products are basket drivers. Maintain depth in high-frequency SKUs.

Cross-Sell & Bundling

High-Lift Pairs (from top_pairs_by_lift.csv):

  • Blueberries ↔ Regular Sliced Bacon (Lift ≈ 57)
  • Blueberries ↔ Jamaican Allspice (Lift ≈ 57)
  • Blueberries ↔ Orange Liqueur (Lift ≈ 57)

Anchor Recommendations:

  • By Lift: Yellow Onions → Original Lager (Lift ≈ 44)
  • By Confidence: Yellow Onions → Banana (29%), Large Lemon (14.7%), Organic Baby Spinach (13.8%)

Recommendation: Bundle fresh produce (bananas, spinach, lemons) with staple onions in recipe kits; explore discovery bundles (e.g., produce + beverages) based on high-lift but niche pairs.

Category & Department Insights

  • Department Penetration: Produce (75%), Dairy & Eggs (68%), Beverages (45%)
  • Cross-Department Synergy: Alcohol + Other (Lift ≈ 3.1), Alcohol + Pets (Lift ≈ 2.5)
  • Aisle Hotspots: Fresh Fruits (56%), Fresh Vegetables (44%), Packaged Vegetables & Fruits (37%)

Recommendation: Promote Produce + Dairy pairings (e.g., fruit + yogurt). Leverage alcohol cross-sell potential with pet or “other” categories for curated lifestyle bundles.

Loyalty & Reorder Programs

Replenishment: With a 59% reorder rate and top SKUs (bananas, spinach, strawberries) recurring heavily, there’s strong scope for auto-replenishment.

Recommendation: Offer subscription boxes or reorder reminders for high-frequency fresh items; trigger nudges around the 7–10 day median cycle.

Implementation Roadmap

Quick Wins (0–1 month)

  • Launch “Frequently Bought Together” widgets (Blueberries + Bacon, Onions + Bananas).
  • Target push notifications for Sunday 10 AM.

Mid-Term (1–3 months)

  • Homepage redesign: feature Produce & Dairy.
  • Introduce produce–dairy recipe bundles and discovery promos around unusual high-lift pairs.

Long-Term (3–6 months)

  • Subscription program for high-reorder fresh items.
  • Category-level recommender leveraging co-occurrence matrix.

Visualizations

Co-Occurrence of the Top 20 Products Top 50 Products by Frequency of Order

Data Analysis Projects