mirror of
https://gitea.gofwd.group/Forward_Group/ballistic-builder-spring.git
synced 2025-12-05 18:46:44 -05:00
5.3 KiB
5.3 KiB
Ballistic Import Pipeline
A high-level overview of how merchant data flows through the Spring ETL system.
Purpose
This document explains how the Ballistic backend:
- Fetches merchant product feeds (CSV/TSV)
- Normalizes raw data into structured entities
- Updates products and offers in an idempotent way
- Supports two sync modes:
- Full Import
- Offer-Only Sync
1. High-Level Flow
ASCII Diagram
┌──────────────────────────┐
│ /admin/imports/{id} │
│ (Full Import Trigger) │
└─────────────┬────────────┘
│
▼
┌──────────────────────────────┐
│ importMerchantFeed(merchantId)│
└─────────────┬────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ readFeedRowsForMerchant() │
│ - auto-detect delimiter │
│ - parse CSV/TSV → MerchantFeedRow objects │
└─────────────────┬──────────────────────────────────────┘
│ List<MerchantFeedRow>
▼
┌──────────────────────────────────────┐
│ For each MerchantFeedRow row: │
│ resolveBrand() │
│ upsertProduct() │
│ - find existing via brand+mpn/upc │
│ - update fields (mapped partRole) │
│ upsertOfferFromRow() │
└──────────────────────────────────────┘
2. Full Import Explained
Triggered by:
POST /admin/imports/{merchantId}
Step 1 — Load merchant
Using merchantRepository.findById().
Step 2 — Parse feed rows
readFeedRowsForMerchant():
- Auto-detects delimiter (
\t,,,;) - Validates required headers
- Parses each row into
MerchantFeedRow
Step 3 — Process each row
For each parsed row:
a. resolveBrand()
- Finds or creates brand
- Defaults to “Aero Precision” if missing
b. upsertProduct()
Dedupes by:
- Brand + MPN
- Brand + UPC (currently SKU placeholder)
If no match → create new product.
Then applies:
- Name + slug
- Descriptions
- Images
- MPN/identifiers
- Platform inference
- Category mapping
- Part role inference
c. upsertOfferFromRow()
Creates or updates a ProductOffer:
- Prices
- Stock
- Buy URL
- lastSeenAt
- firstSeenAt when newly created
Idempotent — does not duplicate offers.
3. Offer-Only Sync
Triggered by:
POST /admin/imports/{merchantId}/offers-only
Does NOT:
- Create products
- Update product fields
It only updates:
- price
- originalPrice
- inStock
- buyUrl
- lastSeenAt
If the offer does not exist, it is skipped.
4. Auto-Detecting CSV/TSV Parser
The parser:
- Attempts multiple delimiters
- Validates headers
- Handles malformed or short rows
- Never throws on missing columns
- Returns clean MerchantFeedRow objects
Designed for messy merchant feeds.
5. Entities Updated During Import
Product
- name
- slug
- short/long description
- main image
- mpn
- upc (future)
- platform
- rawCategoryKey
- partRole
ProductOffer
- merchant
- product
- avantlinkProductId (SKU placeholder)
- price
- originalPrice
- inStock
- buyUrl
- lastSeenAt
- firstSeenAt
Merchant
- lastFullImportAt
- lastOfferSyncAt
6. Extension Points
You can extend the import pipeline in these areas:
- Add per-merchant column mapping
- Add true UPC parsing
- Support multi-platform parts
- Improve partRole inference
- Implement global deduplication across merchants
7. Quick Reference: Main Methods
| Method | Purpose |
|---|---|
| importMerchantFeed | Full product + offer import |
| readFeedRowsForMerchant | Detect delimiter + parse feed |
| resolveBrand | Normalize brand names |
| upsertProduct | Idempotent product write |
| updateProductFromRow | Apply product fields |
| upsertOfferFromRow | Idempotent offer write |
| syncOffersOnly | Offer-only sync |
| upsertOfferOnlyFromRow | Update existing offers |
| detectCsvFormat | Auto-detect delimiter |
| fetchFeedRows | Simpler parser for offers |
8. Summary
The Ballistic importer is:
- Robust against bad data
- Idempotent and safe
- Flexible for multiple merchants
- Extensible for long-term scaling
This pipeline powers the product catalog and offer data for the Ballistic ecosystem.