AI “Expert” Word Problem-Solving

Relevant Skills: Mobile Development, LLMs

What is this project about?

Current LLMs offer significant advantages when solving domain-specific word problems, such as high accuracy in identifying and classifying important information. LLMs also introduce pitfalls, primarily a lack of verifiable correctness. Our approach to using LLMs to solve domain-specific word problems centers on achieving both an accurate understanding of the word problem in plain English and verifiable correctness through predetermined equations and algorithmic solutions. We achieve this through a three-step process. In the first step, the LLM identifies the problem type using comprehensive classification schemes. In the second step, we query the LLM for variables based on the problem type classification. In the third set, we use a custom solver to provide a verifiably correct step-by-step equation solution. Utilizing this approach for a subset of physics word problems, we created an “expert” mobile experience that allows users to scan word problems and view step-by-step solutions.

A paper on AI Expert models inspired our project. The original paper details an AI expert who completes a task from start to finish; however, our approach is slightly different. We use AI to identify relevant variables, but we rely on equation-solving algorithms to perform the final computations.

What are the core aspects of your project?

Our project provides users the ability to scan/type in physics word problems (from limited domains) and receive step-by-step solutions. The core components of the project are scanning input, prompting an LLM, running an equation solver, and displaying output to users.

What are the goals/vision for this project?

Our goal is to leverage the power of AI to serve as an expert in physics. Currently, LLM benchmarks for parsing text are sufficient, yet LLMs are not as skilled when it comes to mathematical operations. Our goal is to use AI to help parse text to extract the variables for physics problems, and then pick up with an algorithm solver. This approach leverages AI while ensuring (human-verified, equation solver) correctness.

What drove your design choices?

Our design choices were inspired by potential users of the product; we chose our software development tools based on the end product we would create. Operating under the assumption that the use case for our product would be on-the-go convenience, we opted for a mobile app as opposed to a desktop app. Additionally, within the world of mobile development, multiple members of our team have experience with Swift/SwiftUI, so we chose to pursue iOS development exclusively.

What does your project do? What was your client hoping to get out of it?

Our project provides physics students access to step-by-step solution guides. Putting ourselves in the shoes of a potential user, we prioritized accessibility and ease of use. As a mobile app, our product is portable, and our variety of inputs (scanning, text, speech) along with clear, high-contrast design principles make the app a friendly experience for all users. Our clients loved our pitch and look forward to seeing the result of our product.

What are the project requirements? How did you address the requirements?

The project requirements included incorporating agile methodologies/practices/tools, using a version control system, applying TDD, creating technical documents to track progress and solutions, applying industry tools, and explaining practical and ethical concerns associated with our project. We tackled this project using a Scrum team format with a Scrum Master, a Product Owner, and three designers/developers. The relevant agile tools we used consisted of Git for version control and GitHub Projects for tracking tasks.

Our LLM is a black box, but we applied TDD practices when constructing our equation solvers. For technical documentation, we created a paper sample, managed an ongoing project report, built road maps, and gave a demo each week with slides to contextualize and explain progress.

Gaining experience with industry tools, we made use of Git and GitHub Pages as mentioned earlier, as well as XCode, Swift, and SwiftUI for our IDE and language. We leveraged Swift packages for implementing scanning, access to an LLM, and LaTeX formatting, among other tasks. Potential practical and ethical problems associated with our app include accessibility and limited control once the product is in users’ hands. Our app is designed for iOS mobile devices only. If a student doesn’t have an iPhone, they will be at a disadvantage compared to another student who has an iPhone and could use our product, but hopefully, they can find comparable resources. It is impossible for us to limit the actions of users once they download the app. If an instructor does not allow a student to use our software, it is not our responsibility to police/restrict those specific students from using our service.

Shifting to the soft-skills and deliverables side of the project, we were asked to employ active listening and communication skills, as well as provide weekly demos. Making use of our in-class work time, we effectively planned architecture, created task lists, divided tasks, and continuously checked in as a team to assess progress. We successfully met each deadline and delivered informative and engaging presentations at each demo, receiving feedback from our client and peers, which aided the direction of our development.

Future work. If you were to continue this project, what would be the next steps?

Down the road, the next steps would be expanding the domain of areas the app supports, as well as implementing more accessibility elements. The current implementation is limited to kinematics, but it could be expanded to support other areas of physics, such as waves, energy, and more. In terms of accessibility, potential additions would be text-to-speech and language translation. Future work could also consist of offloading more of the solving to AI while maintaining highly accurate responses.

Show and describe your process to design and develop your project

Our design and development process took over five weeks. In the first two weeks we did planning. We started by reading the paper our project was based on, and from that, chose a subset of their solution that we would focus on. This process led us to our goal: we would solve physics word problems. Next, we worked on defining our scope. We compiled a list of physics word problem examples, types, and equations that we would target for our software solution. At the same time, we solidified the UI flow we wanted, researched tools and libraries, and constructed a basic architecture. This was all done by week 2.

The next 3 weeks were all implementation. We worked together using GitHub and GitHub projects, which kept us organized and allowed us to make quick progress. Once all of our design and architecture decisions were made, the development process was straightforward. One key change that was made later in the development process was the decision to move from a locally run LLM to the more capable Gemini. This change did not have a large effect on the structure of our code, but it was the key that enabled our first successful tests.

By the end of the process, we met our goal. The software currently works for a few example problem types. Our design also allows for easy extendability, meaning future work could go towards making the software widely applicable.

Talk about your challenges and achievements

Some of the main challenges that we faced involved the distinction of different problems that the AI can recognize. Since different problems can be interpreted and solved in different and similarly logically-constructed ways, the LLM may confuse these approaches, while our core solve would still have to cater to all of them. Thus, for a problem that can be interpreted in different ways, the response from the LLM may be quite non-deterministic, causing different solutions to arise from different trials of the same problem. To solve this, we ensured to perform careful prompt engineering, making sure that keywords are identified to arrive at the most accurate problem classification, maximizing a deterministic-like behavior from the LLM.

The biggest achievements that we had were the reliability of our solver, being able to handle simple arithmetic calculations, different units, while maintaining very good code maintainability and scalability. The classes we defined (e.g., Value, Unit, Variable, etc.) were designed and implemented so they can be reused and inherited to support future problem types that may require different subtypes of those classes.

Acknowledgements and References

We would like to thank the course instructor, Professor Eliot, and our course mentor, Livia Stein Freitas. We would also like to thank our alumni, the Vivero Fellows, and our peers for feedback throughout the process.

Research Paper: Kook, H. J., & Novak, G. S. (1991). Representation of models for expert problem solving in physics. IEEE Transactions on Knowledge and Data Engineering, 3(1), 48–54.
Physics Textbook: Walker, J., Halliday, D., Resnick, R., & Trees, B. R. (2018). Fundamentals of physics. Physics & Astronomy Faculty Books (Vol. 1).
Gemini AI
Git and GitHub, GitHub Projects
Google Suite
Overleaf
Swift/SwiftUI
Xcode

AI “Expert” Word Problem-Solving

Search Here

Categories