This Spring Boot application provides fast search capabilities for a large product dataset (400K+ products) using both traditional database queries and Apache Lucene full-text search indexing.
- Product Management: Complete CRUD operations for products
- CSV Import: Bulk import of products from CSV files
- Lucene Search: High-performance full-text search with sub-second response times
For a dataset of 400K products:
- Database Search: 2-30 seconds depending on complexity
- Lucene Search: 5-50 milliseconds (50-600x faster)
GET /api/productBySupplier/{supplierIds}?brandSearch={brand}&itemDescriptionSearch={description}&limit={limit}- Advanced supplier search with fuzzy filters
POST /api/search/index/rebuild- Rebuild IndexGET /api/search/index/stats- Get Status of Lucene Index
Primary Index (Optimized for Performance):
- supplier: Supplier ID (primary search field - fastest performance)
Secondary Index:
- productId: Product identifier
- itemDescription: Product description (searchable but not optimized)
Stored Fields (retrievable but not indexed for search):
- supplierGroupId: Supplier group identifier
- smktsMerchCategory: Merchandise category
- liqMerchCategory: Liquor merchandise category
- digitalBrandName: Digital brand name
- subBrandName: Sub-brand name
mvn clean install
mvn spring-boot:run- Import CSV data from
src/main/resources/data-all.csv - Create H2 database with optimized settings for 400K products
- Build Lucene search index
- Start web server on port 8080
The containerisation of this application is based upon the azul/zulu-openjdk:21 image.
- To build the container, run
docker build -t springio/salesforce-poc-springboot . - To start the container, run
docker run -p 8080:8080 -t springio/salesforce-poc-springboot
The application is optimized for large datasets with:
- Persistent H2 Database: Data survives application restarts
- Connection Pooling: 20 max connections, 5 minimum idle
- Batch Processing: 50 records per batch for optimal performance
- Second-level Caching: Enabled for frequently accessed data
- JPA Optimizations: Batch inserts, query optimization
For 400K products, recommended JVM settings:
java -Xms2g -Xmx6g -jar salesforce-poc-0.0.1-SNAPSHOT.jarcurl "http://localhost:8080/api/productBySupplier/12345"# Advanced search: suppliers + brand + item description (fuzzy search)
curl "http://localhost:8080/api/productBySupplier/959609,980801?brandSearch=Coles&itemDescriptionSearch=FREE&limit=10"
# Search by supplier and brand only
curl "http://localhost:8080/api/productBySupplier/959609?brandSearch=Taste&limit=10"
# Search by supplier and description only (fuzzy search)
curl "http://localhost:8080/api/productBySupplier/959609?itemDescriptionSearch=CRACKER&limit=10"
# Just supplier search (no filters)
curl "http://localhost:8080/api/productBySupplier/959609?limit=5"H2 Console available at: http://localhost:8080/h2-console
- JDBC URL:
jdbc:h2:file:./data/productdb - Username:
sa - Password: (empty)
For large datasets, increase JVM heap size:
export MAVEN_OPTS="-Xms2g -Xmx6g"
mvn spring-boot:run- RAM: 4GB (2GB for application, 2GB for OS)
- Storage: 5GB (3GB for database, 1GB for index, 1GB for application)
- CPU: 2 cores minimum, 4+ recommended for optimal performance
- RAM: 8GB+ (6GB for application heap)
- Storage: SSD recommended for database and index files
- CPU: 4+ cores for concurrent search operations