Yilin Bao's Site

About me

I am Yilin Bao, current pursing ECE master degree in UCSD, focusing on machine learning arm.

Research Career Goal

My long-term research goal is about the interpretability of neural networks. I don’t just want to use constraint networks like interpretability loss to mine interpretability, I expect the changes that can be brought about by introducing more mathematical tools.

Research Interests

My current research interests are natural language processing (NLP) and computer vision (CV), and what excites me most is their combination, such as work in the multi-modal direction. At the same time, I have experience related to large language models, and you can check the CV for more information.

I am currently conducting research on reproducing Siamese Masked Autoencoders. Anyone who is interested is welcome to contact me to discuss. 👏 yibao@ucsd.edu

After Research Life

I love reading amazing travel stories, such as Kino’s Journey. I don’t like it when an author tries to tell me something about what he or she thinks about the world through the work. In contrast, an interesting setting unfolds an interesting adventure, no matter when and where the future is, reality and The collision of stories will bring new thinking. 💥

Career Plans

I am currently reaching out for PhD program in 2024 fall, feel free to reach out.

Publications

[1] Bethel, Brandon J. and Dong, Changming and Zhou, Shuyi and Sun, Wenjin and Bao, Yilin. “Assessing
Long Short-Term Memory Network Significant Wave Height Forecast Efficacy in the Caribbean Sea and
Atlantic Ocean.” SSRN Electrical Journal, July 2022.

Currently, many studies have used in situ or remotely sensed observations and numerical models in wave energy assessments, and artificial intelligence (AI) forecasting. However, little attention has been paid on the efficacy of AI forecasting in different wave regimes. Using ten years of buoy observations and numerical wave model output, the significant wave height (SWH) climates of the Caribbean (CS) and Atlantic Ocean (AO) from 2010 – 2019 are first studied. Then, six sites throughout the CS and AO are selected for forecasting using the Long Short-Term Memory (LSTM) network. Although expected, results illustrate that regardless of location, LSTM forecasts were highly accurate, reaching correlation values of >0.8, root-mean-square errors <0.4 m, and mean average percentage errors of <14% up to 12-hr forecast horizons. More interesting was that location-specific geographic and metocean attributes led to divergent forecast outcomes between test sites. Forecast correlations were higher near, but not directly under the Caribbean Low-Level Jet, leading to best forecast results in the western CS, followed by the central CS, and was poorest in the AO. It was conclusively determined AO and CS wave fields are sufficiently different to ensure that mismatched forecast test/training datasets would lead to high levels of error.

Proposals

Topology & Machine learning

Neural network is always recognized as one of the most famous “black box” model. There are attempts to explain what happened in the learning process, what makes it better, what should be avoid. Most frequently used method is add some explainable layer (heat map, attention, etc.) or use clustering. But isn’t there anything more we can think about?

“Less is more!”

Knowledge is about finding abstract logic (pattern) in complex reality (high-dimensional data). Traditional science make this abstraction by find independent components. For instance, state of motion (eigenstate of motion equation) from classical mechanics, principal component analysis in data science (SVD decomposition of covariance), Legendre polynomials of spherical harmonics (functions as eigen, PDE), quantum states to Schrödinger’s equation (also PDE), Laplace matrix and its eigenvector of graph theory. All looking for those special and beautiful, independent and invariant components.

When encounter with really high dimension, like what we are dealing with in machine learning, finding matrix representation doesn’t explain that much, because there are millions of them. So, we need some invariant new, which leads me thinking about introducing topology: characters of connection only.

The early attempts of using topology to explain neural networks was focusing on convolutional neural networks (CNN). For instance, you many find the all kernels learned from CNN in the same layer makes a wonderful mainfold. Topological data analysis (TDA) finds its topology invariants (in terms of Betti numbers), so we will know how many variables (coordinates/groups) we need to describe them (only 2, corresponding to 2 kinds of rotations).

More interesting pioneer works shows that activation function is the key of changing topology of embedding manifold. And full connected layers fold them to prepare for next unbuckle.

I am working on followings currently

Build topology analysis on more different and difficult machine learning tasks and neural network structures, the more experiment we do, the more we might find.
The topology of CNN is very clear now, that means I don’t need to train model from zero start. Similar to fine-tune but not exactly the same. Those kernels, by topology, gives a already-close-to-minimum position, as a result, the training will be accelerated.
People who familiar with TDA might know, it will add a lots of not important topology when the open set grows, I want to develop an algorithm to cut off those meaningless ones, just leave those important.
Topology means a lot, first of all, it leads to group algebra, also by representation theory, it connects to matrix and symplectic geometry, this all means we now have a wonderful tool that can connect ML method with traditional math-science methods.
Topology only gives connection, that means shape, if we can find a numerical representation to tangent bundle of manifolds in data space (I am thinking orthogonal hyperplanes to TDA vectors), we have a Lie group related to current task. I am not sure what this would leads to know, but it is always sure: introduce more math tools == more fun!
The mapper algorithm of TDA can reduce the number of data point while maintain the topological shape, a interesting thought is to develop an “advanced” control net that not only fit human shapes, but all kinds of shapes.

Naitzat, Gregory, Andrey Zhitnikov, and Lek-Heng Lim. “Topology of Deep Neural Networks.” arXiv, April 13, 2020. http://arxiv.org/abs/2004.06093.
Ancona, Marco, et al. “Towards better understanding of gradient-based attribution methods for deep neural networks.” arXiv preprint arXiv:1711.06104 (2017).
Zhao, Yang, and Hao Zhang. “A Topological Approach to Exploring Convolutional Neural Networks.” arXiv, November 2, 2020. http://arxiv.org/abs/2011.00789.
Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and Interpretation of the Topology of Neural Networks.” arXiv, October 18, 2019. http://arxiv.org/abs/1810.03234.
Chern, Shing-Shen. “From Triangles to Manifolds.” The American Mathematical Monthly 86, no. 5 (May 1979): 339. https://doi.org/10.2307/2321093.
Gabrielsson, Rickard Brüel, and Gunnar Carlsson. “Exposition and Interpretation of the Topology of Neural Networks.” arXiv, October 18, 2019. http://arxiv.org/abs/1810.03234.
Chazal, Frédéric, and Bertrand Michel. “An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists.” Frontiers in Artificial Intelligence 4 (September 29, 2021): 667963. https://doi.org/10.3389/frai.2021.667963.
Murugan, Jeff, and Duncan Robertson. “An Introduction to Topological Data Analysis for Physicists: From LGM to FRBs.” arXiv, April 24, 2019. http://arxiv.org/abs/1904.11044.

Supports for neural/cognitive science side:

Dabaghian, Y., F. Mémoli, L. Frank, and G. Carlsson. “A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology.” Edited by Ila Fiete. PLoS Computational Biology 8, no. 8 (August 9, 2012): e1002581. https://doi.org/10.1371/journal.pcbi.1002581.
Remington, Evan D., Devika Narain, Eghbal A. Hosseini, and Mehrdad Jazayeri. “Flexible Sensorimotor Computations through Rapid Reconfiguration of Cortical Dynamics.” Neuron 98, no. 5 (June 2018): 1005-1019.e5. https://doi.org/10.1016/j.neuron.2018.05.020.
Yu, Byron M., John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, and Maneesh Sahani. “Gaussian-Process Factor Analysis for Low-Dimensional Single-Trial Analysis of Neural Population Activity.” Journal of Neurophysiology 102, no. 1 (July 2009): 614–35. https://doi.org/10.1152/jn.90941.2008.
Shine, James M., Michael Breakspear, Peter T. Bell, Kaylena A. Ehgoetz Martens, Richard Shine, Oluwasanmi Koyejo, Olaf Sporns, and Russell A. Poldrack. “Human Cognition Involves the Dynamic Integration of Neural Activity and Neuromodulatory Systems.” Nature Neuroscience 22, no. 2 (February 2019): 289–96. https://doi.org/10.1038/s41593-018-0312-0.
Peyrache, Adrien, Marie M Lacroix, Peter C Petersen, and György Buzsáki. “Internally Organized Mechanisms of the Head Direction Sense.” Nature Neuroscience 18, no. 4 (April 2015): 569–75. https://doi.org/10.1038/nn.3968.
Gallego, Juan A., Matthew G. Perich, Raeed H. Chowdhury, Sara A. Solla, and Lee E. Miller. “Long-Term Stability of Cortical Population Dynamics Underlying Consistent Behavior.” Nature Neuroscience 23, no. 2 (February 2020): 260–70. https://doi.org/10.1038/s41593-019-0555-4.
Low, Ryan J., Sam Lewallen, Dmitriy Aronov, Rhino Nevers, and David W. Tank. “Probing Variability in a Cognitive Map Using Manifold Inference from Neural Dynamics.” Preprint. Neuroscience, September 16, 2018. https://doi.org/10.1101/418939.
Elsayed, Gamaleldin F., Antonio H. Lara, Matthew T. Kaufman, Mark M. Churchland, and John P. Cunningham. “Reorganization between Preparatory and Movement Population Responses in Motor Cortex.” Nature Communications 7, no. 1 (October 27, 2016): 13239. https://doi.org/10.1038/ncomms13239.
Bernardi, Silvia, Marcus K. Benna, Mattia Rigotti, Jérôme Munuera, Stefano Fusi, and C. Daniel Salzman. “The Geometry of Abstraction in the Hippocampus and Prefrontal Cortex.” Cell 183, no. 4 (November 2020): 954-967.e21. https://doi.org/10.1016/j.cell.2020.09.031.
Chaudhuri, Rishidev, Berk Gerçek, Biraj Pandey, Adrien Peyrache, and Ila Fiete. “The Intrinsic Attractor Manifold and Population Dynamics of a Canonical Cognitive Circuit across Waking and Sleep.” Nature Neuroscience 22, no. 9 (September 2019): 1512–20. https://doi.org/10.1038/s41593-019-0460-x.
Singh, G., F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D. L. Ringach. “Topological Analysis of Population Activity in Visual Cortex.” Journal of Vision 8, no. 8 (June 1, 2008): 11–11. https://doi.org/10.1167/8.8.11.

Diffusion model and Alpha Go, information flow and search tree

The diffusion model is wonderful invention in the field of image generation. Combining the generator and discriminator into one. And by using noise, dividing and conquer hard question, in the same time, making a path to desired image.

Alpha Go is an amazing example of reinforcement learning. Use the logic/rule of go, build a search tree, finding most valuable attempts in uncountable possibility of go game.

Both model involve the idea of divide complex tasks into simple ones and build a step by step path to success. Diffusion model shows us how noise can elegantly reverse the process of building information. Alpha go shows us how to build search tree for valuable data enhancement.

The enlightenment for us is that, instead of placing the pawns, we place noise to the problem/task/game we are studying about, those the best path until it reaches a completely zero information (total random). Then use machine learning to study the reverse step!

The applications/experiments we should do are:

Apply the idea to LLM for the issue of illusion. With the dataset of question answering as example, we can use a simple tool to embed it, such as Word2Vec or GloVe, then we randomly add small noise to words, for instance “French capital is Paris” -> “French capital is London” -> “French baguette is London”. Every time it gets more illogical. Then we reverse and train model to make illogical more logical.

Copy drosophila neural group to NN

One of the most excellent achievement from 2023 in science is that, scientists have completed the most advanced map of an insect brain to date, a landmark achievement in neuroscience that brings scientists closer to a true understanding of the mechanics of the mind.

Now, we know how many cells are there and how many connects are built. Based on this, I wish to build a neural network, probably not bipartite and we needs Boltzmann machine, thus, the adjoint matrices will be its parameter.

We can bound the adjoint matrices, and therefore the graph connection, by making it a zero-one matrix according to drosophila neural cell connection. It should be reasonable to assume this structure is already a local minimum, at least. That gives a plenty of things to test.

Can we try to run CNN or masked auto-encoders on this net? Which one is better fitting? Does this means, in drosophila, there might be masked mechanism?
If we assume the model parameters take value continuously or discrete, which is better? In another words, in drosophila, is it digital or analog signal when processing?
Can we run a simulated fruit flies in simulated environment?

References

Winding, M., Pedigo, B. D., Barnes, C. L., Patsolic, H. G., Park, Y., Kazimiers, T., … & Zlatic, M. (2023). The connectome of an insect brain. Science, 379(6636), eadd9330.
Eichler, K., Li, F., Litwin-Kumar, A., Park, Y., Andrade, I., Schneider-Mizell, C. M., … & Cardona, A. (2017). The complete connectome of a learning and memory centre in an insect brain. Nature, 548(7666), 175-182.

Yilin Bao