@c-rob Thank you for taking the time to explore potential solutions to my problem using AI. I sincerely appreciate your efforts and the insights you’ve shared from your own experience working with PDF documents and converting them to various formats.
I have indeed attempted to leverage AI and language models to find a solution to this problem in the past. However, the code suggestions generated by these models often rely on indentation to determine the hierarchy of the lists, which is not a reliable approach for my specific use case. Additionally, many of the AI-generated solutions fail to pass the majority of the test cases I’ve provided, indicating that they are not robust enough to handle the diverse range of inputs and edge cases.
Your firsthand experience with processing OCR material and the challenges you encountered, such as inconsistencies in bullet markings, indentation, and whitespace, resonates with the difficulties I’m facing. Ensuring consistency in the input is indeed crucial for developing a reliable solution. While manual processing of bullets into Markdown worked for your application, I’m hoping to find an automated approach that can handle the variations present in my input data. Nevertheless, your insights have reinforced the importance of considering these factors when developing a solution.