For this year’s project, I will be partnering with Marina Lui to develop Lantern AI, a Chinese character recognition program. Using Python, Github, and machine learning, we hope to train an AI to either recognize handwritten Chinese characters from user input stroke by stroke, or look at a typed Chinese character and determine its radicals (base components that provide the character’s meaning and/or sound). Then, the program will output the character’s English meaning and pinyin. Our first step, which will be quite the challenge, is to gather enough data to train the AI. We plan to have data on 200 to 2,000 of the most commonly used characters (the average individual uses between 3,000 to 4,000 characters). It is important that we develop our program from commonly used characters because our target audience is beginner Chinese learners. We hope to help students learn better by helping them find the definition of characters they do not recognize, but know how to draw.
Before we start, we need to do more research by looking at similar projects. One example is Google’s Quick, Draw!— a game that uses a neural network to recognize doodles. We also will identify how we can make our project different from current solutions, such as Google Translate and dictionaries like Pleco. This project requires a lot of time and dedication, so we are starting as soon as possible!