Alperen Tercan

Hi! I am a PhD student Control field at Electrical and Computer Engineering at University of Michigan, advised by Prof. Necmiye Ozay. My research focuses on Reinforcement Learning and Formal Methods.

Previously, I was a research intern at Max Planck Institute for Software Systems in Dr. Adish Singla's Machine Teaching group. I received my master's in Computer Science from Colorado State University, where I worked on Reinforcement Learning and Formal Methods under the supervision of Prof. Vinayak Prabhu and Prof. Chuck Anderson. My thesis topic was "Solving MDPs with Thresholded Lexicographic Ordering Using Reinforcement Learning". See my thesis here.

My research goal is to build intelligent systems that can reliably solve challenging real-world problems. To this end, I enjoy combining ideas/tools from formal methods, symbolic reasoning, and machine learning. For example, I am interested in designing neuro-symbolic systems that can combine the recent developments in deep learning with the traditional approaches based on symbolic reasoning. Such a synthesis has the potential to bridge the current gap between real-world applicability and success stories from burgeoning AI fields like Reinforcement Learning (RL) by allowing safe, provable, explainable, and sample-efficient learning.

While working towards this goal, there are things that make me particularly appreciate the journey: finding interesting high-level ideas, developing them to complete systems through rigorous theoretical analysis, adopting concepts and utilizing tools from different fields, and collaborating with researchers from diverse backgrounds.

Previously, I graduated with a B.S in Electrical and Electronics Engineering from Bilkent University, Turkey.

Email / GitHub / LinkedIn / CV

Research

Please see below for my main research work.

	Thresholded Lexicographic Ordered Multi-Objective Reinforcement Learning Alperen Tercan, Vinayak Prabhu Under review, 2022 link / Our work on solving multi-objective control tasks with lexicographic preferences and {temporal logic-based reward specifications} required me to combine my knowledge and expertise in optimization, RL, and formal methods. In this work, I’ve developed a taxonomy of control tasks with thresholded lexicographic objectives and used it to show how the current RL approaches fail in certain settings. Then, I discussed several fixes to current methods to improve their performance. Finally, I proposed and tested a novel policy gradient algorithm based on hypercone projections of the policy gradients.
	Synthesizing a Progression of Subtasks for Block-Based Visual Programming Tasks Alperen Tercan et al. AAAI2024 Workshop on AI for Education, 2022 link / The state-of-the-art neural program synthesis methods fail for relatively complex problems(finding the mid-point of a line) which require multiple level nesting. Therefore, we developed a task decomposition framework which can automatically break any task into simpler subtasks that progressively lead to the original task. These subtasks can be used to create a curriculum for faster learning. We then tested the effectiveness of our approach for an RL agent in solving neural program synthesis problems. Moreover, we conducted a user study to evaluate the effectiveness of our task decomposition for human learners who try to solve complex Karel tasks.
	Provable Stateful Next Generation Access Control for Complex Dynamic Systems In Preparation, 2021 link / code / Many real-world problems require flexible, scalable, and fine-grained access control policies. Next Generation Access Control(NGAC) framework inherits these traits from Attribute Based Access Control(ABAC) and provides an intuitive graph-based approach. In this work, we augment NGAC framework with multi-level rule hierarchy and stateful policies. Then, we show how an NGAC policy can be analyzed together with an environment model using Alloy. This allows defining complex dynamic systems and keeping policies still tractable for automated analysis. We present our approach on emergency fire response problem.
	Increased Reinforcement Learning Performance through Transfer of Representation Learned by State Prediction Model Alperen Tercan, Charles W. Anderson 2021 International Joint Conference on Neural Networks (IJCNN), 2020 link / code / slides / We propose using state change predictions as an unbiased and non-sparse supplement for TD-targets. By training a forward model which shares a Q-network’s initial layers, we allow transfer learning from model dynamics prediction to Q value function approximation. We discuss two variants, one doing this only in the initial steps of the training and another one using it throughout the training process. Both variants can be used as enhancements to state-of-the-art RL algorithms.

Other Projects

These include coursework and side projects.

Reinforcement Learning for Combinatorial Optimization over Graphs

project
2020-05-01
slides /

In this project, the use of reinforcement learning for heuristics in graph algorithms is investigated. This line of research can be seen as an extension of general trend in computer science research to replace rule-based, hand-crafted heuristics with data-driven methods. The project focus was identifying possible future research directions rather than a comprehensive survey of the field.

Design and source code from Leonid Keselman's website

Alperen Tercan

Research

Thresholded Lexicographic Ordered Multi-Objective Reinforcement Learning

Synthesizing a Progression of Subtasks for Block-Based Visual Programming Tasks

Provable Stateful Next Generation Access Control for Complex Dynamic Systems

Increased Reinforcement Learning Performance through Transfer of Representation Learned by State Prediction Model

Other Projects

Reinforcement Learning for Combinatorial Optimization over Graphs