Inspirit AI: Planet Hunters

As the universe continues to expand this very moment, it becomes statistically impossible for us humans to be the only living species out here- and for Earth to be the only habitable planet for humans within the universe. Humans first discovered exoplanets in 1992; an exoplanet being a planet revolving around a star, other than our own sun. With the process of transit photometry, a process used to measure the consistency of flux, or change, in light emitted from distant star, corporations focused on the field of astronomy (like NASA) study these fluctuations of light from various distant stars to determine whether exoplanets revolve around such stars. In various graphs measuring these fluctuations (available in the Planet Hunters video), scientists can determine whether certain dips in the graphs (consistent or not) represent the fluctuations of an exoplanet in orbit around the particular star. For example, if the fluctuation of light portrays a pattern over a certain time period, it could most likely be because an exoplanet orbits that star, somewhat dimming the light as the exoplanet passes between the star and Earth’s most advanced telescopes, once per complete orbit.

Since billions of trillions of stars exist in the universe, it would be impossible for the human population to keep track of each individual potential “exoplanet star” (a star with an exoplanet orbiting around it), which is exactly why this project utilizes the power of artificial intelligence. Certain AI models have the capacity to process billions of data bits every second, making it the perfect tool for such tedious work. Throughout the course, as interns, we learned about the difference between training and testing data when it came to building an AI model. With our classification AI, training data was the labeled data fed into the AI that that AI could then interpret, or “learn from,” and base its classification predictions off of, during the testing phase. Testing data is the unclassified data that the AI would have to classify itself, based on the training data. (The actual classifications of the testing data are shown in the confusion matrices in the video.) Our training data was the labelled (“non-exoplanet stars” or “exoplanet stars”) graphs from NASA, while our testing data consisted of unlabelled graphs that the AI model could classify as a star with or without an exoplanet orbiting around it.

Over the course of the project, our group tested the performance of various AI models, including a KNN (k-nearest-neighbors) model, a Logistic Regression model, Decision Trees (Keras/Tensorflow), and more! However, we ran into various issues with each model type, ultimately settling on a CNN model, which is described further in the video. An additional issue was a large imbalance of data. A large majority (99.3%) of the testing and training data were stars without exoplanets in orbit (“non-exoplanet stars”), so the AI inevitably defined most unlabelled graphs as “non-exoplanet stars,” since it didn’t have enough “exoplanet star” graphs in its training data. To fix this, we utilized SMOTE (Synthetic Minority Oversampling Technique) to balance out the data sets. (See description in video) In the final phases of the project, we tested our CNN artificial intelligence model with the balanced, SMOTE data, and which successfully identified all 5 exoplanets!

See Python Google Colaboratory Notebooks Below:

Instructional Preparation: Linear Regression, Logistic Regression, Natural Language Processing, Neural Networks

Project Research & Testing: Planet Hunters 1, Planet Hunters 2, Planet Hunters 3