STL Digital logo - Global IT services and consulting company.

Advanced Document Processing and Query Answering – A PoC with Google Gemini Models

STL Digital recently created a Proof of Concept (PoC) which has yielded promising results in advanced document processing and query answering. This project tackled the complexities of handling diverse document formats and delivering insightful answers. We have documented below the achievements and next steps.

The Objective

The PoC aimed to overcome the obstacles associated with processing and querying documents in a variety of formats, including:

  • Text
  • Images (including technical diagrams)
  • Large tables
  • Scanned handwritten and typewritten documents

The Tech Stack

To tackle this multifaceted challenge, a cutting-edge tech stack was employed:

  • Google Gemini Models: These powerful models provided the foundation for nuanced language understanding and generation.
  • LangChain: This framework seamlessly orchestrated the interaction with language models, ensuring efficient processing.
  • Streamlit: This tool enabled the creation of an intuitive chat UI, making it easy for users to interact with the system.

Domain Expertise and Prompt Engineering

To ensure the Q&A Knowledge Bot could grasp domain-specific nuances, the team fine-tuned the gemini-1.0 model using Supervised Fine Tuning (SFT) with a vast dataset of industry terms, abbreviations, and definitions. Prompt engineering further enhanced the system’s capabilities, providing domain-specific context and similarity search-based context retrieval, guiding the model in constructing precise responses.

Testing and Insights

The system was rigorously tested using questions of varying complexity:

  • Simple: Direct queries about specific information
  • Medium: Questions requiring contextual understanding and inference
  • Complex: Multi-faceted queries involving detailed analysis of large datasets

The outcomes were remarkable, with the system showcasing robust performance across all question types and document formats. Key insights gleaned from the PoC included:

  • Performance Variability: Although the solution effectively extracted and understood diverse data types, certain content formats presented challenges, indicating a need for supplementary processing strategies.
  • User Interaction: The Streamlit-based UI proved to be highly intuitive, significantly improving the query resolution experience.

The Road Ahead

Based on these findings, exciting enhancements are in the pipeline:

  • Complex Query Decomposer: This will employ advanced NLP techniques to handle intricate queries more effectively.
  • Enhanced Image Handling: By combining structured storage and LLM-powered query generation, the system’s ability to process image data will be elevated.
  • Advanced Table Recognition: This will enhance the parsing of complex table structures.
  • Integration of Domain-Specific Components: This will further refine the system’s handling of specialized terminology and context.

Conclusion

This PoC has validated an innovative approach, highlighting the immense potential of combining Google Gemini models, LangChain, and Streamlit for advanced document processing. The team is eagerly anticipating the next phase of development and the incorporation of these exciting improvements.

More White Papers

Scroll to Top