Projects & Publications

Shield Decentralization for Safe Multi-Agent Reinforcement Learning

Daniel Melcer, Christopher Amato, Stavros Tripakis- NeurIPS 2022 Oral (2.1% acceptance rate)

(Abstract) Learning safe solutions is an important but challenging problem in multi-agent reinforcement learning (MARL). Shielded reinforcement learning is one approach for preventing agents from choosing unsafe actions. Current shielded reinforcement learning methods for MARL make strong assumptions about communication and full observability. In this work, we extend the formalization of the shielded reinforcement learning problem to a decentralized multi-agent setting. We then present an algorithm for decomposition of a centralized shield, allowing shields to be used in such decentralized, communication-free environments. Our results show that agents equipped with decentralized shields perform comparably to agents with centralized shields in several tasks, allowing shielding to be used in environments with decentralized training and execution for the first time.

ProofViz: An Interactive Visual Proof Explorer

Daniel Melcer, Stephen Chang- TFP 2021

(Abstract) We introduce ProofViz, an extension to the Cur proof assistant ecosystem that enables interactive visualization and exploration of in-progress proofs. The tool displays a representation of the underlying proof tree, information about each node in the tree, and the values of partially-completed proof terms. Users can interact with the proof by executing tactics, changing the focus to any hole, and undoing or redoing their previous interactions. We anticipate that ProofViz will be useful both as a tool to introduce tactic proofs to learners of dependent type systems, and as a mechanism for advanced users looking for insight into the effects of their custom tactics.

Verification-Guided Tree Search

Alvaro Velasquez, Daniel Melcer- AAMAS 2020 (Extended Abstract)

(Abstract) Monte-Carlo Tree Search (MCTS) algorithms have been adopted by the artificial intelligence planning community for decades due to their ability to reason over long time horizons while providing guarantees on the convergence of the solution policy. In recent years, such algorithms have been modernized through the integration of Convolutional Neural Networks (CNNs) as part of the state evaluation process. However, both traditional and modern MCTS algorithms suffer from poor performance when the underlying reward signal of the environment is sparse. In this paper, we propose a verification-guided tree search solution which incorporates a reward shaping function within modern MCTS implementations. This function leverages the mathematical representation of the underlying objective by leveraging techniques from formal verification.


HackMIT 2019 - Best Command Line Tool Runner Up

What if you didn’t have to install RDP software to view the screen of another computer? What if you could just use SSH or something to share your screen? What would that even look like? This. This is what it would look like.

As it turns out, your (linux) terminal provides enough functionality to allow you to view a live desktop feed, handle keypresses and mouse inputs, and automatically adjust the display when the window is resized.

Charity Cart

Brickhack 2019 - Most socially impactful project

Donate to your favorite charities without spending any more than you already do.

Find generic versions of the food you already buy, and donate the price difference to charities.

Don’t know who to donate to? The application automatically suggests charities based on the types of items in your cart.

Git Mess

BostonHacks 2018 - Best Use of Twilio API

Send and receive SMS messages using the tools you already know and love. Specifically git.

Type out a message and git push to send. Then git pull to check for any new messages.

Includes an ASCII art converter to view any image attachments that you may have been sent.

Invertible Programming Language

This project was made in collaboration with Josh Goldman for Building Extensible Systems (“Hack your own language”).

This is a Racket-based programming language that is capable of analyzing simple functions and automatically creating the inverse of that function such that applying the inverse of a one-argument function to the result of the “normal” function results in the original argument given.

> (define f->c
    (λ-auto-invertible (x)
    ((muln 5/9)
    ((subn 32) x))))
> (f->c 32)
> (f->c -40)
> (define c->f (invert f->c))
> (c->f 0)
> (c->f -40)


Ethereum Connect Four

Play Connect Four on the Ethereum blockchain! A smart contract autonomously manages the current participants of a game, ensures the validity of moves made by a user, and awards of the winner of a game. Currently deployed on Ropsten testnet.

Note Requires MetaMask extension to interact with the application. If you would like to go to the game anyways, click here.

Riemann Zeta Function GPU Fraction Search

The Riemann Zeta function is an accurate predictor of the number of primes less than a given value. The inputs of the Zeta function that cause the function to equal zero have special mathematical properties.

Under the mentorship of David Biersach, I wrote a program that searched for a pattern in the smallest 100 known inputs with this property using their continued fraction representations. I used the CUDA framework to accelerate the computation by performing calculations hundreds of times in parallel on a consumer graphics card.

This research was conducted as part of the 2016 High School Research Program for Brookhaven National Laboratory.

Image source: Linas Vepstas [CC BY-SA 3.0]