Data Insights
Date: 2025-09-10 · Duration: 75.51000213623047 min Organizer: Nader Attendees: Nader, Ahmad, Mike, Martin
Summary
- MVP for document processing includes a classifier and updated Confluence documentation led by Martin and Ahmed for better workflow clarity.
- Redesigned merchant creation allows instant blank merchant upload upon clicking ‘add merchant’ to enhance user experience.
- File upload optimization to start as users drag and drop files, streamlining the process significantly.
- Document classification pipeline employs Python libraries for extraction and LLM-based classification, improving accuracy.
- Specialized agent architecture proposed to enhance classification based on document type, streamlining processing.
- Fallback mechanisms for classification errors discussed to ensure robustness in data handling and processing.
- Hybrid vision and text extraction strategy confirmed for document processing to maximize accuracy and efficiency.
- MCA facing external delays due to EMS file provider not delivering necessary data, impacting project timeline.
- Commission calculation accuracy remains a concern due to missing merchant data and reliability of flat rate formulas.
- Focus on auto-filling applications prioritized for next release, followed by documentation functionality for merchant creation.
Action Items
- Martin Complete work on document generation endpoint and UI integration tasks (32:00) Create UI ticket for highlighting functionality in merchant application workflow Implement commission-related tickets for version without estimates (01:00:56) Continue work on document processing MVP with classifier integration (32:00)
Ahmadreza Modify document upload endpoint to support optional type classification Research and implement pymupdf library for better document text/image extraction instead of full vision-based approach Implement two-step classification system: primary classifier followed by specialized agents Integrate existing statement analyzer output parser for statement document types Move back to markdown extraction approach from PDFs rather than full image conversion
Nader Create R&D ticket for ML-based commission approximation approach using tree-based regression (01:12:19) Follow up with EMS provider regarding missing data files for MCA functionality