Files
ballistic-builder-spring/importLogic.md

213 lines
5.5 KiB
Markdown
Raw Permalink Normal View History

# Ballistic Import Pipeline
A high-level overview of how merchant data flows through the Spring ETL system.
---
## Purpose
This document explains how the Ballistic backend:
1. Fetches merchant product feeds (CSV/TSV)
2. Normalizes raw data into structured entities
3. Updates products and offers in an idempotent way
4. Supports two sync modes:
- Full Import
- Offer-Only Sync
---
# 1. High-Level Flow
## ASCII Diagram
```
┌──────────────────────────┐
│ /admin/imports/{id} │
│ (Full Import Trigger) │
└─────────────┬────────────┘
┌──────────────────────────────┐
│ importMerchantFeed(merchantId)│
└─────────────┬────────────────┘
┌────────────────────────────────────────────────────────┐
│ readFeedRowsForMerchant() │
│ - auto-detect delimiter │
│ - parse CSV/TSV → MerchantFeedRow objects │
└─────────────────┬──────────────────────────────────────┘
│ List<MerchantFeedRow>
┌──────────────────────────────────────┐
│ For each MerchantFeedRow row: │
│ resolveBrand() │
│ upsertProduct() │
│ - find existing via brand+mpn/upc │
│ - update fields (mapped partRole) │
│ upsertOfferFromRow() │
└──────────────────────────────────────┘
```
---
# 2. Full Import Explained
Triggered by:
```
POST /admin/imports/{merchantId}
```
### Step 1 — Load merchant
Using `merchantRepository.findById()`.
### Step 2 — Parse feed rows
`readFeedRowsForMerchant()`:
- Auto-detects delimiter (`\t`, `,`, `;`)
- Validates required headers
- Parses each row into `MerchantFeedRow`
### Step 3 — Process each row
For each parsed row:
#### a. resolveBrand()
- Finds or creates brand
- Defaults to “Aero Precision” if missing
#### b. upsertProduct()
Dedupes by:
1. Brand + MPN
2. Brand + UPC (currently SKU placeholder)
If no match → create new product.
Then applies:
- Name + slug
- Descriptions
- Images
- MPN/identifiers
- Platform inference
- Category mapping
- Part role inference
#### c. upsertOfferFromRow()
Creates or updates a ProductOffer:
- Prices
- Stock
- Buy URL
- lastSeenAt
- firstSeenAt when newly created
Idempotent — does not duplicate offers.
---
# 3. Offer-Only Sync
Triggered by:
```
POST /admin/imports/{merchantId}/offers-only
```
Does NOT:
- Create products
- Update product fields
It only updates:
- price
- originalPrice
- inStock
- buyUrl
- lastSeenAt
If the offer does not exist, it is skipped.
---
# 4. Auto-Detecting CSV/TSV Parser
The parser:
- Attempts multiple delimiters
- Validates headers
- Handles malformed or short rows
- Never throws on missing columns
- Returns clean MerchantFeedRow objects
Designed for messy merchant feeds.
---
# 5. Entities Updated During Import
### Product
- name
- slug
- short/long description
- main image
- mpn
- upc (future)
- platform
- rawCategoryKey
- partRole
### ProductOffer
- merchant
- product
- avantlinkProductId (SKU placeholder)
- price
- originalPrice
- inStock
- buyUrl
- lastSeenAt
- firstSeenAt
### Merchant
- lastFullImportAt
- lastOfferSyncAt
---
# 6. Extension Points
You can extend the import pipeline in these areas:
- Add per-merchant column mapping
- Add true UPC parsing
- Support multi-platform parts
- Improve partRole inference
- Implement global deduplication across merchants
---
# 7. Quick Reference: Main Methods
| Method | Purpose |
|--------|---------|
| importMerchantFeed | Full product + offer import |
| readFeedRowsForMerchant | Detect delimiter + parse feed |
| resolveBrand | Normalize brand names |
| upsertProduct | Idempotent product write |
| updateProductFromRow | Apply product fields |
| upsertOfferFromRow | Idempotent offer write |
| syncOffersOnly | Offer-only sync |
| upsertOfferOnlyFromRow | Update existing offers |
| detectCsvFormat | Auto-detect delimiter |
| fetchFeedRows | Simpler parser for offers |
---
# 8. Summary
The Ballistic importer is:
- Robust against bad data
- Idempotent and safe
- Flexible for multiple merchants
- Extensible for long-term scaling
2025-12-02 07:21:23 -05:00
This pipeline powers the product catalog and offer data for the Ballistic ecosystem.