• Junior year I took a class called “Advanced Computer Science: AI and Data Sciences”, which was a post-AP Computer Science course. The course was structured such that there was minimal instruction and the majority of learning occurred through self-exploration. Our first project was to explore Markov Chains.

  • In searching for a dataset to run a Markov Chain on, I ran across the complete dialogue of the original trilogy of Star Wars. As a science fiction and Star Wars fan, I knew it was the perfect candidate.

    The code uses a Markov chain to analyze the talking patterns of each character, and then creates a fake dialogue between random characters. Due to the nature of the Markov chain, the majority of the dialogue has the appearance of making sense, but in reality, means next to nothing. Additionally, there’s no continuity between the lines of dialogue - the Markov chain only has so much it can do, and unfortunately continuity is not one of them. It would have been helpful to have far more lines per character to test on, although it are limited by the number of Star Wars movies that exist. The dialogue, however, is entertaining and highlights many of the unique phrases characters use over and over.

    The next iteration of this project, to fix meaning and continuity issues, would feature a neural network, which can create more lifelike imitations of dialogue.

  • The dialogue was packaged in the form of a .text file, with each piece of dialogue on a separate line and character names and line numbers separated by quotes. The first step, therefore, was processing the .text file and turning it into something more manageable, such as a dataframe. Once the dialogue was in a dataframe, it was split further by character, so all the lines of dialogue for each character would be together. Every character with less than 20 lines was dropped, as there wasn’t enough for the Markov chain to look at. The order the character’s talked in was also added to a list, with characters with less than 20 lines also dropped.

    The Markov function runs a simple markov chain, which looks at every word in a string and the words next to that word, to learn which words typically predate and postdate other words. This is how the markov chain works, and why it doesn’t always make logical sense: it doesn’t analyze sentence patterns, only the typical order of words in relation to other words.

    The Markov function is run on the character order to determine a talking order for the output dialogue. Each character in that list is then generated a line of dialogue of a random length using the Markov function.

    The output is 10 lines of dialogue from a variety of characters.

Star Wars Dialogue Generator

  • Starting the summer between my junior and senior year, I’ve worked in the Duke Quantum Center as an electrical engineering intern. Most of my work is centered around PCB (Printable Circuit Boards) and components for electrical systems. I was tasked with finding a replacement for a system’s Low Dropout Regulators (LDOs). After finding a replacement with the same specs and ordering them, they needed to be tested to determine if they could replace the old LDOs.

  • To test the LDOs, they were connected to 15V power, and the output was measured on an oscilloscope. The wave function on the oscilloscope output by the LDOs was then captured and exported as raw data. From here, a Fourier transform could be run on the data to separate the wavefunction into it’s distinct parts. After running and graphing the Fourier transform for each wavefunction captured, the graph and subsequent data was recorded in a github document viewable by anyone in the lab.

  • The data from the oscilloscope came in as a excel sheet, so the first task was to read the excel file, which was then transformed into a dataframe. The dataframes were also converted to arrays. The Fourier transform, done through a package called scipy, requires both a timestamp (0.0001) and the size of the array. Once this information was collected, the Fourier transform was run on the array, and a Fourier transform frequency function was run using the size of the array and the timestamp. These two transforms formed the x and y axis the plot generated (see left). Finally, the sum of the magnitudes of the Fourier transform was found, which gives the Noise Spectral Density.

Fourier Transform Analysis

  • After watching a couple episodes of Drive to Survive on Netflix, I watched my first Formula 1 race the summer before my senior year. I absolutely fell in love with the sport! The entire sport is an adrenaline packed pressure cooker for insane engineering and data science.

    I also love data. Data has so many stories to tell, and secrets contained within. Patterns resonate with me, and sleuthing for patterns is what data analysis is all about.

    When I ran across an API for F1 data from 1950 to the present day, I knew I wanted to do something with it.

  • I started by downloading the API data as a 14 individual csv file - a file type I felt more comfortable working with. With the data in a dataframe, I build a class that parses the data, builds a schema, and populates several SQL tables in a database on remote server.

    I then wanted to be able have continuously up to date information, so I used the API and a XML parser to collect the newest data. I am currently working on building a countdown GUI to tell the amount of time until an upcoming race or the winners of a previous race in the season. The start of this GUI can be seen on the right.

    Even after I finish the GUI, there’s so much that can be done with the data. The API pulls so much information - patterns could be found, a predictor could be made, a visual representation could be developed. I have so many ideas, and not enough time to code them all!

F1 Data Analysis

[IN PROGRESS]