Multimodal Virtual Escape Room

This project aims to build a more immersive PC gaming experience for a Virtual Escape Room game. We added speech and gesture control to the game without any additional hardware needed using Google's Speech Recognition API and the MediaPipe ML framework.

My Role: System Architecture, System Integration and GUI automation.
This was a group project with Katie Chen and Riley Shu, and was built for the Intelligent Multimodal User Interfaces class at MIT, where we learnt about the advances in Human - Computer Interaction.

Demo

On the left is my teammate Katie demonstrating the Virtual Escape Room game with the speech and gesture controls we developed.

System Model

I built the User Intent based system model to create a framework for the architecture and to guide the software development of the project by keeping what the user wants to do at the center.

The gesture and speech recognition software runs as a separate thread in the background. After the system determines the intent, the corresponding PC control keys are triggered using a Python based GUI Automation package affecting the player movement in the Unity game. The software also receives data from the Unity game through ZeroMQ to better determine the user intent i.e does the user want to move or activate a puzzle.