I completed my Postdoc in Computer Science under the direction of Andrew McCallum with a focus on structured representation learning in 2022. I obtained my Ph.D. in Mathematics from UMass Amherst in 2018, advised by Andrea Nahmod and Nestor Guillen. My Ph.D. research was in the field of partial differential equations, using techniques from harmonic analysis, calculus of variations, and Riemannian geometry. At the same time, I also completed my MS in Computer Science at UMass Amherst, with a focus on machine learning. Prior to my graduate studies, I obtained a BA in Mathematics and Economics (double-major) from Central Connecticut State University in 2010, graduating in two years with a 4.0 GPA.


I perform research on foundational aspects of machine learning, particularly representation learning and probabilistic modeling, with applications to natural language processing and structured prediction. I am particularly interested in models with additional geometric, probabilistic, or set-theoretic structure, and the way in which this structure can be leveraged to offer additional capabilities and interpretability to deep learning models.

A working draft of my current work on developing a measure-theoretic framework for set representation learning is available below:

Measure-Theoretic Set Representation Learning (a preprint)
Michael Boratko, Dhruvesh Patel, Shib Sankar Dasgupta, and Andrew McCallum

The prime example of such a representation is box embeddings, a novel region-based representation learning model which also compactly represents a joint probability distribution.

Box Embeddings

Over the course of 10 research papers, I have provided methodological improvements and extensions to box embeddings, and applied them to a wide range of tasks such as collaborative filtering, textual entailment, and multi-label classification, obtaining state-of-the-art results. Box embeddings represent elements as hyperrectangles, i.e. Cartesian products of intervals, and can be thought of as trainable Venn-diagrams with valid set-theoretic and probabilistic semantics. I formalized the probabilistic semantics of box embeddings (UAI 2021), proving that even models with softness can yield a valid probability distribution and, as a result, outperform competing probabilistic representation baselines. I also proved that box embeddings can represent any directed graph (NeurIPS 2021) and introduce a novel adaptation of box embeddings with a trainable "softness", improving learning to the point that they are the optimal choice for directed graph representation in any dimension.

My current research formalizes the set-theoretic aspects of box representations by casting them in a general framework, building from first principles the general requirements for any set-theoretic representation learning model. With this rigorous framework in hand, I plan to extend the expressivity of box embeddings in multiple ways - exploring alternate geometries (such as tori or hyperbolic space), combining boxes with probabilistic circuits, and by developing deep box models capable of rich, probabilistically interpretable reasoning in their hidden layers. Developing this abstract framework and requirements for set-theoretic representation learning also lays bare the foundational principles on which such a representation should be built, facilitating analysis of alternative representations which use geometric, region-based or distributional representations, and providing sound motivation for the invention of novel representation paradigms. As set-theoretic representation and reasoning is so foundational, not only to mathematics and probability but also in the organization of ideas into concepts and general logic of thought, there are many potential machine learning objectives which stand to benefit from this work, including abstract objectives, such as hierarchical representations, compositional generalization, and reasoning, to more practical immediate tasks such as information retrieval, question answering, and collaborative filtering.

While vectors in Euclidean space have formed the basis of most machine learning architectures, several promising lines of work are exploring the extent to which objects with additional geometric structure may provide distinct benefits. Vectors in hyperbolic space, for example, can model trees with lower distortion, an idea which has been extended to Riemannian and Lorentzian manifolds wherein the model learns the curvature best suited to the data. Gaussian embeddings, on the other hand, learn representations of the data which are themselves distributions, allowing the representation to learn some notion of uncertainty and also providing natural asymmetric measures (eg. KL divergence) between the representations. Finally, region-based embeddings represent elements using regions such as cones, disks, and boxes, allowing access to a rich set of set-theoretic relations such as intersection, complement, and containment. With an appropriate measure on the embedding space, one can also calculate volumes of these regions, allowing the embeddings to be interpreted with rigorous probabilistic semantics. Of these various choices of region-based representations, box embeddings stand out due to their representational capacity, tractable computability, and simplicity of implementation.

General Research Interests

More broadly, I am interested in program synthesis, deep learning, optimization techniques, and interpretability. I am also interested in game theory, parallel and distributed algorithms, programming languages and theory of computation. I enjoy exploring the theoretical underpinnings of machine learning and identifying areas where my mathematical background can be leveraged to improve performance and solve real problems of practical importance with a large impact.


  • (December 10, 2021) Released geometric-graph-embedding, a Python library for training representations which can model directed graphs. This library contains clean implementations of many geometric embedding methods - Vector Similarity and Distance, Bilinear, ComplEx, Order Embeddings and Probabilistic Order Embeddings, Hyperbolic Embeddings including variants of squared Lorentzian distance, Hyperbolic Entailment Cones, Gumbel Boxes, and a novel t-Box model with traininable tempeartures.
  • (July 27, 2021) Released box-embeddings, a Python library for geometric representation learning compatible with PyTorch and TensorFlow. This library provides box embeddings as a module, allowing easy interoperability with existing deep learning frameworks. It also serves as a reference implementation, following current recommended practices for training and numerical stability, with comprehensive documentation.
  • (January 6, 2021) Released ProtoQA, a question answering dataset for prototypical commonsense reasoning. The dataset is available from GitHub and from the Hugging Face dataset module. Also released the ProtoQA Evaluator, which is an extensible evaluation framework for the ProtoQA dataset which implements all three similarity measures (exact match, WordNet, and RoBERTa) from the original paper.



  1. Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings
    Dongxu Zhang, Michael Boratko, Cameron Musco, Andrew McCallum. NeurIPS 2022.
  2. Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings Slides
    Shib Sankar Dasgupta*, Michael Boratko*, Siddhartha Mishra, Shriya Atmakuri, Dhruvesh Patel, Xiang Lorraine Li, Andrew McCallum. ACL 2022.
  3. Modeling Label Space Interactions in Multi-label Classification using Box Embeddings
    Dhruvesh Patel, Pavitra Dangati, Jay-Yoon Lee, Michael Boratko, Andrew McCallum. ICLR 2022.
  4. An Evaluative Measure of Clustering Methods Incorporating Hyperparameter Sensitivity
    Siddhartha Mishra, Nicholas Monath, Michael Boratko, Ari Kobren, Andrew McCallum. AAAI 2022.
  5. Capacity and Bias of Learned Geometric Embeddings for Directed Graphs Video Slides Code
    Michael Boratko*, Dongxu Zhang*, Nicholas Monath, Luke Vilnis, Andrew McCallum. NeurIPS 2021.
  6. Box Embeddings: An Open-Source Library for Representation Learning using Geometric Structures Video Slides Code
    Tejas Chheda*, Purujit Goyal*, Trang Tran*, Dhruvesh Patel, Michael Boratko, Shib Sankar Dasgupta, Andrew McCallum. EMNLP (Demo Track) 2021.
  7. Min/Max Stability and Box Distributions (Long Presentation, 48/777 ≈ 6%)Video Slides
    Michael Boratko, Javier Burroni, Shib Sankar Dasgupta, Andrew McCallum. UAI 2021.
  8. Modeling Fine-Grained Entity Types with Box Embeddings
    Yasumasa Onoe, Michael Boratko, Andrew McCallum, Greg Durrett. ACL 2021.
  9. Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning
    Xuelu Chen*, Michael Boratko*, Muhao Chen, Shib Sankar Dasgupta, Xiang Lorraine Li, Andrew McCallum. NAACL 2021.
  10. Improving Local Identifiability in Probabilistic Box Embeddings
    Shib Sankar Dasgupta*, Michael Boratko*, Dongxu Zhang, Luke Vilnis, Xiang Lorraine Li, Andrew McCallum. NeurIPS 2020.
  11. ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning
    Michael Boratko*, Xiang Lorraine Li*, Rajarshi Das*, Tim O'Gorman*, Dan Le, Andrew McCallum. EMNLP 2020.
  12. Representing Joint Hierarchies with Box Embeddings
    Dhruvesh Patel*, Shib Sankar Dasgupta*, Michael Boratko, Xiang Lorraine Li, Luke Vilnis, Andrew McCallum. AKBC 2020.
  13. Smoothing the Geometry of Probabilistic Box Embeddings (Oral Presentation, 24/1591 ≈ 1.5%)
    Xiang Li*, Luke Vilnis*, Dongxu Zhang, Michael Boratko, Andrew McCallum. ICLR 2019.
  14. An Interface for Annotating Science Questions
    Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael J. Witbrock. EMNLP (Demo Track) 2018.


  1. Box-To-Box Transformations for Modeling Joint Hierarchies
    Shib Sankar Dasgupta, Xiang Lorraine Li, Michael Boratko, Dongxu Zhang, Andrew McCallum. Rep4NLP Workshop at ACL 2021.
  2. A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset (Best Paper Award)
    Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael Witbrock MRQA Workshop at EMNLP 2018.

Mathematics Ph.D. Research

As mentioned above, for my mathematics Ph.D. I studied partial differential equations using techniques from harmonic analysis, calculus of variations, and Riemannian geometry. My thesis is comprised of two parts, the first of which improves bounds on the Sobolev norms of solutions to the Nonlinear Schrödinger equation in dimensions 2, 3, and 4. In the second part I proved a uniqueness theorem for solutions to a class of degenerate elliptic partial differential equations. The thesis is published online, and can be accessed from ScholarWorks:

Additional Interests

During high school I started a website development and IT consulting company called Starstreak, which allowed me to get experience with a wide range of software and hardware, from niche industries to the enterprise level. Running the business also allowed me to gain a broad exposure to a wide range of clientele, many of whom became my friends over the years.

I also have a love for music, and am fortunate to have played trumpet and piano and even sing with very talented musicians in classical, jazz, and rock bands. At the moment I sing and arrange music for a horn band which takes inspiration (and occasionally direct transcriptions!) from artists such as Lawrence and Cory Wong to perform everything from modern pop like Maroon 5 to classic funk and soul like Tower of Power.