About Me

Xiaoyang Wan

I am a master’s student in the Computer and Information Program at University of Pennsylvania. My current research focuses on natural language processing (NLP), multimodal models, and image editing.

During my undergraduate years at Zhejiang University, I had the privilege of being advised by Prof. SiLiang Tang and Prof. Juncheng Li from the DCD Lab in the School of Computer Science. Their mentorship significantly shaped my academic interests and approach to research.

You can find my Curriculum Vitae and check out my work on Github.

Feel free to contact me via email at wan3@seas.upenn.edu.

Here is my research experience. I am interested in the intersect of large language model and image comprehension. You are welcome to contact with me.

Debugged and optimized existing image editing pipelines, including diffusion models and HuggingFace frameworks, to address challenges in large-scale image editing tasks.
Configured and executed various scripts for the AnyEdit project, including setting up the environment(e.g., Conda), downloading pre-trained weights, and running multiple pipelines with specified parameters.
Compared performance across pipelines to identify the ideal model for a certain task, analyzing results for accuracy, efficiency, and visual consistency for move&resize pipeline. And use masks of different objects in the image to identify the most ideal object for the task.
Designed and implemented a checkpoint save-and-resume mechanism, allowing the system to automatically save progress and resume operations seamlessly after interruptions.

Built a Retrieval-Augmented Generation (RAG) knowledge base by constructing datasets that pair comic images with contextual information retrieved from Wikipedia, utilizing Contriver, a powerful multilingual and cross-lingual retrieval model.
Enabled language models like GPT to generate contextually accurate responses by retrieving relevant comic descriptions from the knowledge base and combining them with generated image descriptions.

Here are some interesting project I have done in my courses and spare time, you are welcome to discuss any details with me.

Actively participated in the entire game development process, from the initial proposal, creating prototypes, and rough demos, through alpha and beta testing, to the final player testing and game release.
Build a solid turn-based game system to support multiple players and their skill casting at arbitrary time, using Unity and C#.
Achieved an A+ score (top 5% in the course), and our game secured the 2nd place in the final showcase. Remarkably, this was accomplished without any prior hands-on experience in Unity, while competing against seasoned game developers in the course.

Created stylized AIGC styles based on base model majicMix-v6 by prompt-engineering to create solid art style, and trained LoRA to make New Year Style portraits.
Integrated new styles we create into facechain, a deep-learning toolchain for generating your DigitalTwin released by modelscope.
Won 3rd prize in the 6th Open Source Innovation Competition for the new portrait style we provided.

Utilized the UmiJS framework, Ant Design component library and React to build a user interface for device management. Implemented features include login authentication, data analytics, and device location tracking.
Adopted Go as the programming language for backend development, integrated with the Gin framework for efficient architecture, and utilized GORM for seamless database operations, all while working with a MySQL database.