Welcome!

Welcome to Logic for Systems! Here are a few reminders about how to use this book.

With JavaScript enabled, the table of contents will allow you to select a specific chapter. Likewise, the search icon should allow you to search for arbitrary words throughout all the notes. The lecture indexing is for Spring 2024.

These notes are not final!

This outline is subject to change, and notes are often modified in response to TA feedback or student Q&A before class. Some notes will be greyed out and inaccessible in the table of contents.

Exercises

Exercises are enclosed in expandable spoiler tags:

Exercise Answers

If you see an expandable piece of text, like this:

Think, then click.

Text for you to read after thinking or doing an exercise...

stop and think, or do the related exercise, before expanding.

Spoilers

1710 is built to be inclusive. We've got first year students and grad students in the same classroom. Depending on which courses you've taken in the past, you might seen some of the ideas we'll discover today (or on other days) already. If that's the case, avoid giving spoilers, and don't dismiss the value of review.

Manifesto: Logic for Systems

Setting the Stage

If you're taking this course or reading this book, you've probably had to complete a class with programming assignments---or at least written some small program. Take a moment to list a handful of such assignments: what did you have to build?

Now ask yourself:

How did you know what was the "right" behavior to implement?
How did you know what data structures or algorithms were the "right" ones to use?
How did you know your program "worked", in the end?

Some of these questions have "expected" answers. For instance, you might say you know your code worked because you tested it. But is that really the truth? In terms of consequences, the true bar for "correctness" in your programming classes is the grade you got. That is, you "know" your programs worked because someone else said they did. And you knew what to do because you were told what to do. And you knew which algorithms to use, probably, because they'd just been taught to you.

Some time from now, you might be a professional software engineer. You might be working on a program that controls the fate of billions of dollars, tempers geopolitical strife, or controls a patient's insulin pump. Would you trust a TA to tell you those programs worked "good enough"? Would you trust your boss to understand exactly what needed to happen, and tell you exactly how to do it? Probably not!

We need to think about what "correctness" means for us, and what level of confidence we need. We can't ever get to 100% confidence in correctness, but the perfect should never be the enemy of the good.

Let's start with unit-testing (as practiced in intro, 0320, and many other courses) and deconstruct it. What does it do well? What does it do poorly?

Exercise: Make two lists to answer the above question. Why do we test? What could go wrong, and how can the sort of testing you've done in other classes let us down?

Hopefully we all agree that concrete input-output testing has its virtues and that we should keep doing it. But let's focus on the things that testing doesn't do well. You might have observed that (for most interesting programs, anyway) tests cannot be exhaustive because there are infinitely many possible inputs. And since we're forced to test non-exhaustively, we have to hope we pick good tests---tests that not only focus on our own implementation, but on others (like the implementation that replaces yours eventually) too.

Worse, we can't test the things we don't think of, or don't know about; we're vulnerable to our limited knowledge, the availability heuristic, confirmation bias, and so on. In fact, we humans are generally ill equipped for logical reasoning, even if trained. Let's see if I can convince you of that.

Classic (and not-so-classic) Puzzles

Supervision

Suppose we're thinking about the workings of a small company. We're given some facts about the company, and have to answer a question based on those facts. Here's an example. We know that:

Alice supervises Bob.
Bob supervises Charlie.
Alice graduated Brown.
Charlie graduated Harvale.

Question: Does someone who graduated from Brown directly supervise someone who graduated from another University?

Think, then click.

Yes! Regardless of whether Bob graduated from Brown, some Brown graduate supervises some non-Brown graduate. Reasoning by hypotheticals, there is one fact we don't know: where Bob graduated. In case he graduated Brown, he supervises Charlie, a non-Brown graduate. In case he graduated from another school, he's supervised by Alice, a Brown graduate.

Humans tend to be very bad at reasoning by hypotheticals. There's a temptation to think that this puzzle isn't solvable because we don't know where Bob graduated from. Even Tim thought this at first after seeing the puzzle---in grad school! For logic!

Now imagine a puzzle with a thousand of these unknowns. A thousand boolean variables means $2^{1000}$ cases to reason through. Want to use a computer yet?

Reasoning about knowledge

There is a prison in a magical world where an evil wizard holds a family of four gnomes. Every day, the wizard forces the gnomes to play a game for their freedom: he lines them up, single-file, in one or more rooms, facing the wall. The gnomes cannot move or communicate with each other; they can only look straight ahead. The wizard then pulls out four hats: two orange and two blue, and magics one onto each gnome's head.

The wizard then walks the line, asking each gnome: "What is your hat color?" They may try to answer, or remain silent for 10 seconds (and then the wizard moves on). If a gnome guesses correctly, they all immediately go free. But if one guesses incorrectly, they become the wizard's magical servants forever. So it's in their best interest to not answer unless they are absolutely convinced that they have guessed correctly.

Neither the wizard nor the gnomes can cheat. It's against magical law. The gnomes are, however, very intelligent. Smarter than the wizard for sure: they're perfect logical reasoners.

Here's an example configuration of the puzzle room:


(Why are they smiling?)

In this configuration, can the gnomes escape? If so, why?

Think, then click.

Yes! The gnomes can escape, because they're able to use the knowledge of other gnomes without explicit communication. When the wizard asks Gnome #2 what color their hat is, Gnome #2 can conclude nothing, and is silent. Gnome #3 can then reason that his hat, and the hat of Gnome #4, must be different colors. Only two colors are possible. And so the wizard is thwarted.

To solve this puzzle, you need to reason about what the other agents know, and what we expect them to do with that knowledge. These sorts of epistemic statements can be useful in practice.

A Real Scenario

There's a real cryptographic protocol called the Needham-Schroeder public-key protocol. You can read about it here. Unfortunately the protocol has a bug: it's vulnerable to attack if one of the principles can be fooled into starting an exchange with a badly-behaved or compromised agent. We won't go into specifics. Instead, let's focus on the fact that it's quite easy to get things like protocols wrong, and sometimes challenging for us humans to completely explore all possible behaviors -- especially since there might be behaviors we'd never even considered! It sure would be nice if we could get a computer to help out with that.

A pair of former 1710 students did an ISP on modeling crypto protocols, using the tools you'll learn in class. Here's an example picture, generated by their model, of the flaw in the Needham-Schroeder protocol:

You don't need to understand the specifics of the visualization; the point is that someone who has studied crypto protocols would. And this really does show the classic attack on Needham-Schroeder. You may not be a crypto-protocol person, but you probably are an expert in something you'd like to model, and you might very well get the chance to do so this semester.

Logic for Systems: Automated Reasoning as an Assistive Device

The human species has been so successful, in part, because of our ability to use assistive devices -- tools! Eyeglasses, bicycles, hammers, bridges: all devices that assist us in navigating the world in our fragile meat-bodies. One of our oldest inventions, writing, is an assistive device that increases our long-term memory space and makes that memory persistent. Computers are only one in a long line of such inventions.

So why not (try to):

use computers to help us test our ideas?
use computers to exhaustively check program correctness?
use computers to help us find gaps in our intuition about a program?
use computers to help us explore the design space of a data structure, or algorithm, card game, or chemical reaction?
anything else you can think of...

There's a large body of work in Computer Science that tries to do all those things and more. It covers topics like system modeling, formal methods, and constraint solving. That's what Logic for Systems is about. This course will give you the foundational knowledge to engage with many different applications of these ideas, even if you don't end up working with them directly every day.

Computer Science in 2024

The shape of engineering is changing, for better or worse. Lots of people are excited, scared, or both about language models like ChatGPT. This course won't teach you how to use generative AI (in spite of the fact that we now have multiple assignments that use it). So, especially now, it's reasonable to wonder: what will you get from this course?

There are two questions that will never go out of style, and won't be answered by AI (at least, I think, not in my lifetime):

What do you want to build? That is, what does your customer really need? Answering this requires talking to them and other stakeholders, watching their processes, seeking their feedback, and adjusting your design based on it. And no matter who -- or what -- is writing the actual code, you need to be able to express all this precisely enough that they can succeed at the implementation.
How will you evaluate what you get? No matter who -- or what -- is building the system, validation and verification will remain important.

In short: software engineering is more than just code. Even setting aside the customer-facing aspects, we'll still need to think critically about what it is we want and how to evaluate whether we're getting it. So I'm not just vaguely optimistic about the future; I'm confident that the importance of formal methods, modeling, and other skills that you'll learn here will continue to be useful -- or even become more so -- as engineering evolves.

But the tools we'll learn in this course apply to more than just code, too.

"Formal Methods"

Formal methods (FM) are a broad collection of techniques for modeling, reasoning about, verifying, and understanding software, hardware, and other kinds of systems. In this course, We'll focus on something called lightweight formal methods, which are characterized by tradeoffs that favor ease of use over strong guarantees (although we'll sometimes achieve those as well).

This course will teach you some concrete formal methods. It will also prepare you to engage with others, if you're interested in doing so. Some industrial examples I'm fond of include:

Amazon Web Services' Zelkova, which helps administrators author better security policies for their services;
Microsoft's static driver verifier, which helps increase the reliability of low-level device drivers in Windows;
MongoDB's work on modeling replication, which found a real bug in their code. Quoting the linked page: "We've never encountered this issue in testing or in the field and only found it by reasoning about the edge cases. This shows writing and model checking ... specs is an excellent alternative way to find and verify edge cases." (Ellipsis mine.)

We can find real applications for FM outside Computer Science too---even the law. Here's an article about the value of modeling legal concepts to find loopholes in the law. This is the sort of FM we'll be learning how to do in 1710.

This Github repository keeps a (reasonably up to date, but not exhaustive!) list of other industrial applications of formal methods. Check it out!

(Exercise) For Next Time

Can you find one or two of these applications that especially interest you? Alternatively, think about other kinds of "system" you interact with regularly, or have learned about. What would you like to understand better about those systems? (A prior-year final project modeled the rules of baseball, and we all learned something about the game in the process.)

Tools

The main tool we'll use in this course in Forge. Forge is a tool for relational modeling; we'll talk about what that means later on. For now, be aware that we'll be progressing through three language levels in Forge:

Froglet, which restricts the set of operations available so that we can jump right in more easily. If you have intuitions about object-oriented programming, those intuitions will be useful in Froglet, although there are a few important differences that we'll talk about. (See also our error gallery.)
Relational Forge, which expands the set of operations available to include sets, relations, and relational operators. Again, we'll cover these in detail later. They are useful for reasoning about complex relationships between objects and for representing certain domains, like databases or graphs.
Temporal Forge, which lets us easily model how a system's state evolves over time.

Forge is under active development. In fact, most of the codebase was written by undergraduate researchers over the past few years.

We'll also use:

Hypothesis, a testing library for Python; and
Z3, an SMT solver library.

(Brown CSCI 1710) Overview of the Semester

See the syllabus for a timeline. In short, you'll have weekly homeworks, with occasional interruption for a project. Project topics are proposed by you; our definition of "system" is pretty broad: in the past, groups have modeled and reasoned about puzzles, games, laws, chemical reactions, etc. in addition to obviously "computer-science" systems. We have one case study, and labs most weeks. Some class meetings will have exercises, which are graded for participation.

Brown CS: LfS as Capstone

If you're using the old Sc.B. requirements: using this course as your capstone involves doing a bit more work for your final project; I like to make sure the topic you propose lets you integrate some other ideas from other courses.

Common Misconceptions

This is not a "systems" course, and there's no more than minor overlap with any other course at Brown.

We will use terms that (for example) discrete math courses use, like "relation" and "quantifier". However, we'll learn everything we need as part of this course.

Textbook

We don't have a required text. Instead, we'll use this free, online text that is being adapted from Tim's lecture notes this semester. From time to time, we may provide supplementary reading. This isn't a course you'll want to passively consume; we expect that you'll do the readings, engage with exercises, and ask questions as appropriate.

Often, these notes will contain more than can be covered in class. It will still be useful to read.

How can you use generative AI in this course?

You are allowed to "collaborate" with generative AI using the specific method(s) we provide and within the limits we set. You may not otherwise employ such tools in 1710; doing so would be a violation of the academic code.

Note that you shouldn't use the 2023 syllabus to gauge how we'll use AI in 2024. AI has a much larger role in some of our assignments this year. (E.g., you'll now be using techniques from this course to work with LLM-generated code.)

From Tests to Properties

Today's livecode is here.

(Brown CSCI 1710) Logistics

The first assignment goes out today: it's a homework, in Python. If you're not familiar with Python, you should check out our optional Python lab. The homework uses a library called Hypothesis, which we'll see some of in lecture today. You'll also be asked to use ChatGPT to generate code. Next Wednesday's lab will also use ChatGPT.

Please fill out the lab preference form!

Where are we going?

We'll talk about more than just software soon. For now, let's go back to testing. Most of us have learned how to write test cases. Given an input, here's the output to expect. Tests are a kind of pointwise \emph{specification}; a partial one, and not great for fully describing what you want, but a kind of specification nonetheless. They're cheap, non-trivially useful, and better than nothing.

We talked a bit last time about how tests aren't always good enough: they carry our biases, they can't cover an infinite input space, etc. But even more, they're not always adequate carriers of intent: if I write assert median([1,2,3]) == 2, what exactly is the behavior of the system I'm trying to confirm? Surely I'm not writing the test because I care specifically about [1,2,3] only, and not about [3,4,5] in the same way? Maybe there was some broader aspect, some property of median I cared about when I wrote that test. What do you think it was? What makes an implementation of median correct?

Think, then click!

There might be many things! One particular idea is that, if the input list has odd length, the median needs to be an element of the list.

There isn't always an easy-to-extract property for every unit test. But this idea---of encoding goals instead of specific behaviors---forces us to start thinking critically about what we want from a system and helps us to express it in a way that others (including, perhaps, LLMs) can use. It's only a short hop from there to some of the real applications we talked about last time, like verifying firewalls or modeling the Java type system.

Sometimes, you can test exhaustively!

Sometimes the input space is small enough that exhaustive testing works well. This blog post, entitled "There are only four billion floats" is an example.

Notice that this, too, is a different kind from testing from what you're used to.

A New Kind of Testing

Cheapest Paths

Consider the problem of finding cheapest paths in a weighted graph. There are quite a few algorithms you might use: Dijkstra, Bellman-Ford, even a plain breadth-first search for an unweighted graph. You might have implemented one of these for another class!

The problem statement seems simple: take a graph $GR A P H$ and two vertex names $V 1$ and $V 2$ as input. Produce the cheapest path from $V 1$ to $V 2$ in $GR A P H$ . But it turns out that this problem hides a lurking issue.

Find the cheapest path from vertex $G$ to vertex $E$ on the graph below.

Think, then click!

The path is G to A to B to E.

Great! We have the answer. Now we can go and add a test case for with that graph as input and (G, A, B, E) as the output.

Wait -- you found a different path? G to D to B to E?

And another path? G to H to F to E?

If we add a traditional test case corresponding to one of the correct answers, our test suite will falsely raise alarms for correct implementations that happen to find different answers. In short, we'll be over-fitting our tests to @italic{one specific implementation}: ours. But there's a fix. Maybe instead of writing:

shortest(GRAPH, G, E) == [(G, A), (A, B), (B, E)]

we write:

shortest(GRAPH, G, E) == [(G, A), (A, B), (B, E)] or
shortest(GRAPH, G, E) == [(G, D), (D, B), (B, E)] or
shortest(GRAPH, G, E) == [(G, H), (H, F), (F, E)]

What's wrong with the "big or" strategy? Can you think of a graph where it'd be unwise to try to do this?

Think, then click!

There are at least two problems. First, we might have missed some possible solutions, which is quite easy to do; the first time Tim was preparing these notes, he missed the third path above! Second, there might be an unmanageable number of equally correct solutions. The most pathological case might be something like a graph with all possible edges present, all of which have weight zero. Then, every path is cheapest.

This problem -- multiple correct answers -- occurs in every part of Computer Science. Once you're looking for it, you can't stop seeing it. Most graph problems exhibit it. Worse, so do most optimization problems. Unique solutions are convenient, but the universe isn't built for our convenience.

What's the solution? If test cases won't work, is there an alternative? (Hint: instead of defining correctness bottom-up, by small test cases, think top-down: can we say what it means for an implementation to be correct, at a high level?)

Think, then click!

In the cheapest-path case, we can notice that the costs of all cheapest paths are the same. This enables us to write:

cost(cheapest(GRAPH, G, E)) = 11

which is now robust against multiple implementations of cheapest.

This might be something you were taught to do when implementing cheapest-path algorithms, or it might be something you did on your own, unconsciously. (You might also have been told to ignore this problem, or not told about it at all...) We're not going to stop there, however.

Notice that we just did something subtle and interesting. Even if there are a billion cheapest paths between two vertices in the input graph, they all have that same, minimal length. Our testing strategy has just evolved past naming specific values of output to checking broader properties of output.

Similarly, we can move past specific inputs: randomly generate them. Then, write a function is_valid that takes an arbitrary input, output pair and returns true if and only if the output is a valid solution for the input. Just pipe in a bunch of inputs, and the function will try them all. You can apply this strategy to most any problem, in any programming language. (For your homework this week, you'll be using Python.)

Let's be more careful, though. Is there something else that cheapest needs to guarantee for that input, beyond finding a path with the same cost as our solution?

Think, then click!

We also need to confirm that the path returned by cheapest is indeed a path in the graph!

Here's a sketch of an is_valid function for cheapest path:

Think, then click!

isValid : input: (graph, vertex, vertex), output: list(vertex) -> bool
  returns true IFF:
    (1) output.cost == trustedImplementation(input).cost
    (2) every vertex in output is in input's graph
    (3) every step in output is an edge in input
    ... and so on ...

This style of testing is called Property-Based Testing (PBT). When we're using a trusted implementation---or some other artifact---to either evaluate the output or to help generate useful inputs, it is also a variety of Model-Based Testing (MBT).

Model-Based Testing

There's a lot of techniques under the umbrella of MBT. If time permits, we'll talk more about this later in the semester. For now, know that modeling systems, which is what we'll be doing for much of this class, can be used to help generate good tests.

There are a few questions, though...

Question: Can we really trust a "trusted" implementation?

No, not completely. It's impossible to reach a hundred percent trust; anybody who tells you otherwise is selling something. Even if you spend years creating a correct-by-construction system, there could be a bug in (say) how it is deployed or connected to other systems.

But often, questions of correctness are really about the transfer of confidence: my old, slow implementation has worked for a couple of years now, and it's probably mostly right. I don't trust my new, optimized implementation at all: maybe it uses an obscure data structure, or a language I'm not familiar with, or maybe I don't even have access to the source code at all.

And anyway, often we don't need recourse to any trusted model; we can just phrase the properties directly.

Exercise: What if we don't have a trusted implementation?

You can use this approach whenever you can write a function that checks the correctness of a given output. It doesn't need to use an existing implementation (it's just easier to talk about that way). In the next example we won't use a trusted implementation at all!

Question: Where do the inputs come from?

Great question! Some we will manually create based on our own cleverness and understanding of the problem. Others, we'll generate randomly.

Random inputs are used for many purposes in software engineering: "fuzz testing", for instance, creates vast quantities of random inputs in an attempt to find crashes and other serious errors. We'll use that same idea here, except that our notion of correctness is usually a bit more nuanced.

Concretely:

A diagram of property-based testing. A random input generator, plus some manually-chosen inputs, are sent to the implementation under test. The outputs are then run through the validator function.

It's important to note that some creativity is still involved here: you need to come up with an is_valid function (the "property"), and you'll almost always want to create some hand-crafted inputs (don't trust a random generator to find the subtle corner cases you already know about!) The strength of this approach lies in its resilience against problems with multiple correct answers, and in its ability to mine for bugs while you sleep. Did your random testing find a bug? Fix it, and then add that input to your list of regression tests. Rinse, repeat.

If we were still thinking in terms of traditional test cases, this would make no sense: where would the outputs come from? Instead, we've created a testing system where concrete outputs aren't something we need to provide. Instead, we check whether the program under test produces any valid output.

Hypothesis

There are PBT libraries for most every popular language. For your homework, you'll be using a library for Python called Hypothesis. Hypothesis has many helper functions to make generating random inputs relatively easy. I want to spend the rest of class stepping you through using the library. Let's test a function in Python itself: the median function in the statistics library, which we began this chapter with. What are some important properties of median?

CSCI 1710: LLMs and Testing

We're all going to be using LLMs to generate code in the future. But (like humans) LLMs aren't infallable. Knowing how to test will be even more important than ever. That's why your first homework in 1710 starts by asking you to generate code using an LLM of your choice, such as ChatGPT. Then, you'll use property-based testing to assess its correctness.

To be clear, you will not be graded on the correctness of the code you prompt an LLM to generate. Rather, you will be graded on how good your property-based testing is.

Later in the semester, you'll be using PBT again to test more complex software.

Now let's use Hypothesis to test at least one of those properties. We'll start with this template:

from hypothesis import given, settings
from hypothesis.strategies import integers, lists
from statistics import median

# Tell Hypothesis: inputs for the following function are non-empty lists of integers
@given(lists(integers(), min_size=1)) 
# Tell Hypothesis: run up to 500 random inputs
@settings(max_examples=500)
def test_python_median(input_list):    
    pass

# Because of how Python's imports work, this if statement is needed to prevent 
# the test function from running whenever a module imports this one. This is a 
# common feature in Python modules that are meant to be run as scripts. 
if __name__ == "__main__": # ...if this is the main module, then...
    test_python_median()

Let's start by filling in the shape of the property-based test case:

def test_python_median(input_list):    
    output_median = median(input_list) # call the implementation under test
    print(f'{input_list} -> {output_median}') # for debugging our property function
    if len(input_list) % 2 == 1:
        assert output_median in input_list 
    # The above checks a conditional property. But what if the list length isn't even?
    #   (and can we write a stronger property, regardless?)
    # ...

After some back and forth, we might end up somewhere like this:

def test_python_median(input_list):
    output_median = median(input_list)
    print(f'{input_list} -> {output_median}')
    if len(input_list) % 2 == 1:
        assert output_median in input_list
    
    # Question 1: What's going wrong? The _property_ seems reasonable...
    lower_or_eq =  [val for val in input_list if val <= output_median]
    higher_or_eq = [val for val in input_list if val >= output_median]
    assert len(lower_or_eq) >= len(input_list) // 2    # // ~ floor
    assert len(higher_or_eq) >= len(input_list) // 2   # // ~ floor
    # Question 2: Is this enough? :-)

As the questions in the code hint, there's still a problem with this. What do you think is missing?

Notice how being precise about what correctness means is very important. With ordinary unit tests, we're able to think about behavior point-wise; here, we need to broadly describe our goals. There's advantages to that work: comprehension, more powerful testing, better coverage, etc.

There's more to do, but hopefully this gives you a reasonably nice starting point.

Introduction to Modeling (Part 1, Basics)

Today's livecode is here.

What's a Model?

Models

A model is a representation of a system that faithfully includes some but not all of the system's complexity. There are many different ways to model a system, all of which have different advantages and disadvantages. Think about what a car company does before it produces a new car design. Among other things, it creates multiple models. E.g.,

it models the car in some computer-aided design tool; and then
creates a physical model of the car, perhaps with clay, for testing in wind tunnels etc.

There may be many different models of a system, all of them focused on something different. As the statisticians say, "all models are wrong, but some models are useful". Learning how to model a system is a key skill for engineers, not just within "formal methods". Abstraction is one of the key tools in Computer Science, and modeling lies at the heart of abstraction.

In this course, the models we build aren't inert; we have tools that we can use the explore and analyze them!

Don't Be Afraid of Imperfect Representations

We don't need to fully model a system to be able to make useful inferences. We can simplify, omit, and abstract concepts/attributes to make models that approximate the system while preserving the fundamentals that we're interested in.

EXERCISE: If you've studied physics, there's a great example of this in statics and dynamics. Suppose I drop a coin from the top of the science library, and ask you what its velocity will be when it hits the ground. Using the methods you learn in beginning physics, what's something you usefully disregard?

Think, then click!

Air resistance! Friction! We can still get a reasonable approximation for many problems without needing to include that. (And advanced physics adds even more factors that aren't worth considering at this scale.) The model without friction is often enough.

Tools, Documentation, and Supplemental Reading

We'll be using the Forge declarative modeling language for this course. Forge is a "lightweight" modeling tool. Lightweight techniques tend to eschew full formalization and embrace some partiality (see Jackson and Wing) in exchange for cost-effectiveness, agility, and other benefits. As Bornholt, et al. write in their recent paper, a lightweight approach "do(es) not aim to achieve full formal verification, but instead emphasize(s) automation, usability, and the ability to continually ensure correctness as both software and its specification evolve over time."

This week will be a sort of "whirlwind tour" of Forge. We'll cover more in future classes; you can also access the Forge documentation.

Forge Updates

We'll be updating Forge regularly; expect an update roughly every week. Sometimes we may update more often to resolve issues, etc. or less often if no changes are needed. Updates will be announced on EdStem, and each update will have a new version number. Please keep Forge updated and include your version number in questions or bug reports.

There will be a Forge update coming this week by Wednesday morning, before class; you'll use Forge for the first time in this week's lab.

Systems vs. Models (and Implementations)

When we say "systems" in this module, we mean the term broadly. A distributed system (like replication in MongoDB) is a system, but so are user interfaces and hardware devices like CPUs and insulin pumps. Git is a system for version control. The web stack, cryptographic protocols, chemical reactions, the rules of sports and games---these are all systems too!

To help build intuition, let's work with a simple system: the game of tic-tac-toe (also called noughts and crosses). There are many implementations of this game, including this one that Tim wrote for these notes in Python. And, of course, these implementations often have corresponding test suites, like this (incomplete) example.

Exercise: Play a quick game of tic-tac-toe. If you can, find a partner, but if not play by yourself.

Notice what just happened. You played the game, and so doing ran your own "implementation" of the rules. The result you got was one of many possible games, each with its own specific sequence of legal moves, leading to a particular ending state. Maybe someone won, or maybe the game was a tie. Either way, many different games could have ended with that same board.

Declarative modeling is different from programming. When you're programming traditionally, you give the computer a set of instructions and it follows them. This is true whether you're programming functionally or imperatively, with or without objects, etc. Declarative modeling languages like Forge work differently. The goal of a model isn't to run instructions, but rather to describe the rules that govern systems.

Here's a useful comparison to help reinforce the difference (with thanks to Daniel Jackson):

An empty program does nothing.
An empty model allows every behavior.

Modeling Tic-Tac-Toe Boards

What are the essential concepts in a game of tic-tac-toe?

Think, then click!

We might list:

the players X and O;
the 3-by-3 game board, where players can put their marks;
the idea of whose turn it is at any given time; and
the idea of who has won the game at any given time.

Let's start writing our model in Forge! We certainly need a way to talk about the noughts and crosses themselves:

#lang forge/bsl
abstract sig Player {}
one sig X, O extends Player {}

The first line of any Forge model will be a #lang line, which says which sub-language the file is. We'll start with the Froglet language (forge/bsl, where bsl is short for "beginner student language") for now. Everything you learn in this language will apply in other Forge languages, so I'll use "Forge" interchangeably.

You can think of sig in Forge as declaring a kind of object. A sig can extend another, in which case we say that it is a child of its parent, and child sigs cannot overlap. When a sig is abstract, any member must also be a member of one of that sig's children; in this case, any Player must either be X or O. Finally, a one sig has exactly one member---there's only a single X and O in our model.

We also need a way to represent the game board. We have a few options here: we could create an Index sig, and encode an ordering on those (something like "column A, then column B, then column C"). Another is to use Forge's integer support. Both solutions have their pros and cons. Let's use integers, in part to get some practice with them.

abstract sig Player {}
one sig X, O extends Player {}

sig Board {
  board: pfunc Int -> Int -> Player
}

Every Board object contains a board field describing the moves made so far. This field is a partial function, or dictionary, for every Board that maps each (Int, Int) pair to at most one Player.

What Is A Well-Formed Board?

These definitions sketch the overall shape of a board: players, marks on the board, and so on. But not all boards that fit this definition will be valid. For example:

Forge integers aren't true mathematical integers, but are bounded by a bitwidth we give whenever we run the tool. So we need to be careful here. We want a classical 3-by-3 board with indexes of 0, 1, and 2, not a board where (e.g.) row -5, column -1 is a valid location.

We'll call these well-formedness constraints. They aren't innately enforced by our sig declarations, but often we'll want to assume they hold (or at least check that they do). Let's encode these in a wellformedness predicate:

-- a Board is well-formed if and only if:
pred wellformed[b: Board] {
  -- row and column numbers used are between 0 and 2, inclusive  
  all row, col: Int | {
    (row < 0 or row > 2 or col < 0 or col > 2) 
      implies no b.board[row][col]      
  }
}

This predicate is true of any Board if and only if the above 2 constraints are satisfied. Let's break down the syntax:

Constraints can quantify over a domain. E.g.,all row, col: Int | ... says that for any pair of integers (up to the given bidwidth), the following condition (...) must hold. Forge also supports, e.g., existential quantification (some), but we don't need that here. We also have access to standard boolean operators like or, implies, etc.
Formulas in Forge always evaluate to a boolean; expressions evaluate to sets. For example,
- the expression b.board[row][col] evaluates to the Player (if any) with a mark at location (row, col) in board b; but
- the formula no b.board[row][col] is true if and only if there is no such `Player``.

Well talk more about all of this over the next couple of weeks. For now, just keep the formula vs. expression distinction in mind when working with Forge.

Notice that, rather than describing a process that produces a well-formed board, or even instructions to check well-formedness, we've just given a declarative description of what's necessary and sufficient for a board to be well-formed. If we'd left the predicate body empty, any board would be considered well-formed---there'd be no formulas to enforce!

Running Forge

The run command tells Forge to search for an instance satisfying the given constraints:

run { some b: Board | wellformed[b]}

When we click the play button (or type racket <filename> in the terminal), the engine solves the constraints and produces a satisfying instance, (Because of differences across solver versions, hardware, etc., it's possible you'll see a different instance than the one shown here.) A browser window should pop up with a visualization.

Running Forge on Windows

If you're running on Windows, the Windows-native cmd and PowerShell will not properly run Forge. Instead, we suggest using one of many other options: the VSCode extension, DrRacket, Git bash, Windows Subsystem for Linux, or Cygwin.

There are many options for visualization. The default which loads initially is a directed-graph based one:

For now, we'll use the "Table" visualization---which isn't ideal either, but we'll improve on it next time.

This instance contains a single board, and it has 9 entries. Player O has moved in all of them (the 0 suffix of O0 in the display is an artifact of how Forge's engine works; ignore it for now). It's worth noticing two things:

This board doesn't look quite right: player O occupies all the squares. We might ask: has player O been cheating? But the fact is that this board satisfies the constraints we have written so far. Forge produces it simply because our model isn't yet restrictive enough, and for no other reason. "Cheating" doesn't exist yet.
We didn't say how to find that instance. We just said what we wanted, and the tool performed some kind of search to find it. So far the objects are simple, and the constraints basic, but hopefully the power of the idea is coming into focus.

Testing Our Predicate

The predicate we just wrote is essentially a function that returns true or false for whichever instance we're in. Thus, we can write tests for it the same way we would for any other boolean-valued function, by writing examples:

-- Helper to make examples about a single predicate
pred all_wellformed { all b: Board | wellformed[b]}

-- suite to help group tests
  -- all_wellformed should be _true_ for the following instance
  example firstRowX_wellformed is {all_wellformed} for {
    Board = `Board0 
    X = `X      O = `O 
    Player = X + O
    `Board0.board = (0, 0) -> `X +
                    (0, 1) -> `X + 
                    (0, 2) -> `X 
  }

  -- all_wellformed should be _false_ for the following instance
  example off_board_not_wellformed is {not all_wellformed} for {
    Board = `Board0 
    X = `X      O = `O 
    Player = X + O
    `Board0.board = (-1, 0) -> `X +
                    (0, 1) -> `X + 
                    (0, 2) -> `X 
  }

Notice that we've got a test thats a positive example and another test that's a negative example. We want to make sure to exercise both cases, or else "always true" or "always" false could pass our suite.

We'll talk more about testing soon, but for now be aware that writing some examples for your predicates can help you avoid bugs later on.

Reducing example verbosity

Forge has some syntax that can help reduce the verbosity of examples like this, but we'll cover it later on.

Reflection: Implementation vs. Model

So far we've just modeled boards, not full games. But we can still contrast our work here against the implementation of tic-tac-toe shared above. We might ask:

How do the data-structure choices, and type declarations, in the implementation compare with the model?
Is there an "implementation" that matches what we just did? (The program's purpose isn't to generate boards, but to play games.)

Next time, we'll extend our model to support moves between boards, and then use that to generate and reason about full games.

Introduction to Modeling (Part 2, Transitions)

The livecode is here.

Logistics

The first lab starts today! The starter model is a variant of the livecode that we produced in class. It also contains a possible version of the move predicate we started writing.

Please keep Forge updated. You'll want to update it for the lab, today.

Project Showcase

Here are a few examples of the sort of thing you'll be able to model in Forge. Of course, these don't cover all possibilities! But they should give you an idea of what you might aim for.

Games and Puzzles

Uno™
(Noah Atanda, Madison Lease, and Priyanka Solanky)

Texas Hold 'em
(Matthew Boranian and Austin Lang)

Triple Triad
(Nick Bottone, Sebastien Jean-Pierre, and Robert Murray)

Programming Language Concepts

Rust Lifetimes and Borrowing
(Thomas Castleman and Ria Rajesh)

Security Concepts

Cryptographic Protocols
(Abigail Siegel and Mia Santomauro)

Networks and Distributed Systems

Network Reachability
(Tim Nelson and Pamela Zave, for the Formal Methods Summer School)

Where We Left Off

Last time, we started running Forge to get instances that contained well formed tic-tac-toe boards.

Modeling More Concepts: Starting Boards, Turns, and Winning

Starting Boards

What would it mean to be a starting state in a game? The board is empty:

pred starting[s: Board] {
  all row, col: Int | 
    no s.board[row][col]
}

Turns

How do we tell when it's a given player's turn? It's X's turn when there are the same number of each mark on the board:

pred XTurn[s: Board] {
  #{row, col: Int | s.board[row][col] = X} =
  #{row, col: Int | s.board[row][col] = O}
}

The {row, col: Int | ...} syntax means a set comprehension, and describes the set of row-column pairs where the board contains X (or O). The # operator gives the size of these sets, which we then compare.

Question: Is it enough to say that OTurn is the negation of XTurn?

No! At least not in the model as currently written. If you're curious to see why, run the model and look at the instances produced. Instead, we need to say something like this:

pred OTurn[s: Board] {
  #{row, col: Int | s.board[row][col] = X} =
  add[#{row, col: Int | s.board[row][col] = O}, 1]
}

Forge supports arithmetic operations on integers like add. While it doesn't matter for this model yet, addition (and other operations) can overflow according to 2's complement arithmetic. For example, if we're working with 4-bit integers, then add[7,1] will be -8. You can experiment with this in the visualizer's evaluator, which we'll be using a lot after the initial modeling tour is done.

(Warning: don't try to use + for addition in Forge! Use add; we'll explain why later.)

Winning the Game

What does it mean to win? A player has won on a given board if:

they have placed their mark in all 3 columns of a row;
they have placed their mark in all 3 rows of a column; or
they have placed their mark in all 3 squares of a diagonal.

We'll express this in a winner predicate that takes the current board and a player name. Let's also define a couple helper predicates along the way:

pred winRow[s: Board, p: Player] {
  -- note we cannot use `all` here because there are more Ints  
  some row: Int | {
    s.board[row][0] = p
    s.board[row][1] = p
    s.board[row][2] = p
  }
}

pred winCol[s: Board, p: Player] {
  some column: Int | {
    s.board[0][column] = p
    s.board[1][column] = p
    s.board[2][column] = p
  }      
}

pred winner[s: Board, p: Player] {
  winRow[s, p]
  or
  winCol[s, p]
  or 
  {
    s.board[0][0] = p
    s.board[1][1] = p
    s.board[2][2] = p
  }
  or
  {
    s.board[0][2] = p
    s.board[1][1] = p
    s.board[2][0] = p
  }  
}

We now have a fairly complete model for a single tic-tac-toe board. Before we progress to games, let's decide how to fix the issue we saw above, where our model allowed for boards where a player has moved too often.

Should we add something like OTurn[s] or XTurn[s] to our wellformedness predicate? If we then enforced wellformedness for all boards, that would indeed exclude such instances---but at some risk, depending on how we intend to use the wellformed predicate. There are a few answers...

If we were generating valid boards, a cheating state might well be spurious, or at least undesirable. In that case, we might prevent such states in wellformed and rule it out.
If we were generating (not necessarily valid) boards, being able to see a cheating state might be useful. In that case, we'd leave it out of wellformed.
If we're interested in verification, e.g., we are asking whether the game of Tic-Tac-Toe enables ever reaching a cheating board, we shouldn't add not cheating to wellformed---at least, not so long as we're enforcing wellformed, or else Forge will never find us a counterexample!

IMPORTANT: In that last setting, notice the similarity between this issue and what we do in property-based testing. Here, we're forced to distinguish between what a reasonable board is (analogous to the generator's output in PBT) and what a reasonable behavior is (analogous to the validity predicate in PBT). One narrows the scope of possible worlds to avoid true "garbage"; the other checks whether the system behaves as expected in one of those worlds.

We'll come back to this later, when we've modeled full games. For now, let's separate our goal into a new predicate called balanced, and add it to our run command above:

pred balanced[s: Board] {
  XTurn[s] or OTurn[s]
}
run { some b: Board | wellformed[b] and balanced[b]}

To view instances for this new run command, select the Execute menu and then Run run$2.

If we click the "Next" button a few times, we see that not all is well: we're getting boards where wellformed is violated (e.g., entries at negative rows, or multiple moves in one square).

We're getting this because of how the run was phrased. We said to find an instance where some board was well-formed and valid, not one where all boards were. By default, Forge will find instances with up to 4 Boards. So we can fix the problem either by telling Forge to find instances with only 1 Board:

run { some b: Board | wellformed[b] and balanced[b]} 
for exactly 1 Board

or by saying that all boards must be well-formed and balanced:

run { all b: Board | wellformed[b] and balanced[b]}

Practice with `run`

Let's try some variant run commands (inspired by live questions in class).

No Boards

Is it possible for an instance with no boards to still satisfy constraints like these?

run {    
     all b: Board | {
         -- X has won, and the board looks OK
         wellformed[b]
         winner[b, X]
         balanced[b]    
     }
 }

Think, then click!

Yes! There aren't any boards, so there's no obligation for anything to satisfy the constraints inside the quantifier. You can think of the `all` as something like a `for` loop in Java or the `all()` function in Python: it checks every `Board` in the instance. If there aren't any, there's nothing to check---return true.

Adding More

We can add more constraints if we have a more focused idea of what we want Forge to find. For example, this addition also requires that X moved in the middle of the board:

run {    
     all b: Board | {
         -- X has won, and the board looks OK
         wellformed[b]
         winner[b, X]
         balanced[b]
         -- X started in the middle
         b.board[1][1] = X
     }
 } for exactly 2 Board

Notice that, because we said exactly 2 Board here, Forge must find instances containing 2 tic-tac-toe boards, and both of them must satisfy the constraints: wellformedness, X moving in the middle, etc.

By the way: not also works. So you could ask for a board where X hasn't won by adding not winner[b, X].

You also have implies and iff (if and only if), although you can still do something like comparing two predicates without iff (try, e.g., asking for instances where A and not B holds).

From Boards to Games

What do you think a game of tic-tac-toe looks like? How should we model the moves between board states?

Think, then click!

It's often convenient to think of the game as a big graph, where the nodes are the states (possible board configurations) and the edges are transitions (in this case, legal moves of the game). Here's a rough sketch:

A game of tic-tac-toe is a sequence of steps in this graph, starting from the empty board. Let's model it.

First, what does a move look like? A player puts their mark at a specific location. In Alloy, we'll represent this using a transition predicate: a predicate that says when it's legal for one state to evolve into another. We'll often call these the pre-state and post-state of the transition:

pred move[pre: Board, row: Int, col: Int, p: Player, post: Board] {
  // ...
}

What constraints should we add? It's useful to divide the contents of such a predicate into:

a guard, which allows the move only if the pre-state is suitable; and
an action, which defines what is in the post-state based on the pre-state and the move parameters.

For the guard, in order for the move to be valid, it must hold that in the pre-state:

nobody has already moved at the target location; and
it's the moving player's turn.

For the action:

the new board is the same as the old, except for the addition of the player's mark at the target location.

Now we can fill in the predicate. Let's try something like this:

pred move[pre: Board, row: Int, col: Int, p: Player, post: Board] {
  -- guard:
  no pre.board[row][col]   -- nobody's moved there yet
  p = X implies XTurn[pre] -- appropriate turn
  p = O implies OTurn[pre]  
  
  -- action:
  post.board[row][col] = p
  all row2: Int, col2: Int | (row!=row2 and col!=col2) implies {        
     post.board[row2][col2] = pre.board[row2][col2]     
  }  
}

There are many ways to write this predicate. However, we're going to stick with this form because it calls out an important point. Suppose we had only written post.board[row][col] = p for the action, without the all on the next following lines. Those added lines, which we'll call a frame condition, say that all other squares remain unchanged; without them, the contents of any other square might change in any way. Leaving them out would be what's called an underconstraint bug: the predicate would be too weak to accurately describe moves in tic-tac-toe.

Exercise: comment out the 3 frame-condition lines and run the model. Do you see moves where the other 8 squares change arbitrarily?

Exercise: could there be a bug in this predicate? (Hint: run Forge and find out!)

2023.5 Introduction to Modeling (Part 3, Transitions)

Today's livecode is here.

Let's continue where we left off. In your lab this week, you probably saw a finished move predicate that was very similar to the one we started writing.

Suppose we ended up with something like this:

pred move[pre: Board, row: Int, col: Int, p: Player, post: Board] {
  -- guard:
  no pre.board[row][col]   -- nobody's moved there yet
  p = X implies XTurn[pre] -- appropriate turn
  p = O implies OTurn[pre]  
  
  -- action:
  post.board[row][col] = p
  all row2: Int, col2: Int | (row!=row2 and col!=col2) implies {        
     post.board[row2][col2] = pre.board[row2][col2]     
  }  
}

There's actually a bug in this predicate. Can you use Forge to find it?

Think, then click

The `all row2...` formula says that for any board location where _both the row and column differ_ from the move's, the board remains the same. But is that what we really wanted? Suppose `X` moves at location `1`, `1`. Then of the 9 locations, which is actually protected?

Row	Column	Protected?
0	0	yes
0	1	no (column 1 = column 1)
0	2	yes
1	0	no (row 1 = row 1)
1	1	no (as intended)
1	2	no (row 1 = row 1)
2	0	yes
2	1	no (column 1 = column 1)
2	2	yes

Our frame condition was too weak! We need to have it take effect whenever either the row or column is different. Something like this will work:

  all row2: Int, col2: Int | 
    ((row2 != row) or (col2 != col)) implies {    
       post.board[row2][col2] = pre.board[row2][col2]     
  }

A Simple Property

Once someone wins a game, does their win still persist, even if more moves are made? I'd like to think so: moves never get undone, and in our model winning just means the existence of 3-in-a-row for some player.

We probably believe this without checking it, but not all such preservation properties are so straightforward. We'll check this one in Forge as an example of how you might prove something similar in a more complex system.

Looking Ahead: This is our first step into the world of verification. Asking whether or not a program, algorithm, or other system satisfies some assertion is a core problem in formal methods, and has a long and storied history that we'll be talking about over the next weeks.

We'll tell Forge to find us pairs of states, connected by a move: the pre-state before the move, and the post-state after it. That's any potential transition in tic-tac-toe. The trick is in adding two more constraints. We'll say that someone has won in the pre-state, but they haven't won in the post-state.

pred winningPreservedCounterexample {
  some pre, post: Board | {
    some row, col: Int, p: Player | 
      move[pre, post, row, col, p]
    winner[pre, X]
    not winner[post, X]
  }
}
run {
  all s: Board | wellformed[s]
  winningPreservedCounterexample
}

The check passes---Forge can't find any counterexamples. We'll see this reported as "UNSAT" in the visualizer.

Aside: the visualizer also has a "Next" button. If you press it enough times, Forge runs out of solutions to show. Right now, this is indicated by some frowning emoji---not the best message.

Generating Complete Games

Recall that our worldview for this model is that systems transition between states, and thus we can think of a system as a directed graph. If the transitions have arguments, we'll sometimes label the edges of the graph with those arguments. This view is sometimes called a discrete event model, because one event happens at a time. Here, the events are moves of the game. In a bigger model, there might be many different types of events.

Today, we'll ask Forge to find us traces of the system, starting from an initial state. We'll also add a Game sig to incorporate some metadata.

one sig Game {
  initialState: one Board,
  next: pfunc Board -> Board
}

pred traces {
    -- The trace starts with an initial state
    starting[Game.initialState]
    no sprev: Board | Game.next[sprev] = Game.initialState
    -- Every transition is a valid move
    all s: Board | some Game.next[s] implies {
      some row, col: Int, p: Player |
        move[s, row, col, p, Game.next[s]]
    }
}

By itself, this wouldn't be quite enough; we might see a bunch of disjoint traces. We could add more constraints manually, but there's a better option: tell Forge, at runtime, that next represents a linear ordering on states.

run {
  traces
} for {next is linear}

The key thing to notice here is that next is linear isn't a constraint; it's a separate annotation given to Forge alongside a run or a test. Never put such an annotation in a constraint block; Forge won't understand it. These annotations narrow Forge's bounds (the space of possible worlds to check) which means they can often make problems more efficient for Forge to solve.

You'll see this {next is linear} annotation again in lab this week. In general, Forge accepts such annotations after numeric bounds. E.g., if we wanted to see full games, rather than unfinished game prefixes (remember: the default bound on Board is up to 4) we could have asked:

run {
  traces
} for exactly 10 Board for {next is linear}

You might notice that because of this, some traces are excluded. That's because next is linear forces exact bounds on Board. More on this next time.

Running And The Evaluator

Forge's default visualization for boards is difficult to use. Directed graphs are great for some applications, but not so good here. That's why we've been using the "table" visualization. Our visualizer (called Sterling) allows you to write short JavaScript visualization scripts. The following script produces game visualizations with states like this:

We'll talk more about visualization scripts later. For now, here's an example from last year---this year's scripts are less verbose and more straightforward to write.

Click to see script.

const d3 = require('d3')
d3.selectAll("svg > *").remove();

/*
  Visualizer for the in-class tic-tac-toe model. This is written in "raw" 
  D3, rather than using our helper libraries. If you're familiar with 
  visualization in JavaScript using this popular library, this is how you 
  would use it with Forge.

  Note that if you _aren't_ familiar, that's OK; we'll show more examples 
  and give more guidance in the near future. The helper library also makes 
  things rather easier. 

  TN 2024

  Note: if you're using this style, the require for d3 needs to come before anything 
  else, even comments.
*/

function printValue(row, col, yoffset, value) {
  d3.select(svg)
    .append("text")
    .style("fill", "black")
    .attr("x", (row+1)*10)
    .attr("y", (col+1)*14 + yoffset)
    .text(value);
}

function printState(stateAtom, yoffset) {
  for (r = 0; r <= 2; r++) {
    for (c = 0; c <= 2; c++) {
      printValue(r, c, yoffset,
                 stateAtom.board[r][c]
                 .toString().substring(0,1))  
    }
  }
  
  d3.select(svg)
    .append('rect')
    .attr('x', 5)
    .attr('y', yoffset+1)
    .attr('width', 40)
    .attr('height', 50)
    .attr('stroke-width', 2)
    .attr('stroke', 'black')
    .attr('fill', 'transparent');
}


var offset = 0
for(b = 0; b <= 10; b++) {  
  if(Board.atom("Board"+b) != null)
    printState(Board.atom("Board"+b), offset)  
  offset = offset + 55
}

The Evaluator

Moreover, since we're now viewing a single fixed instance, we can evaluate Forge expressions in it. This is great for debugging, but also for just understanding Forge a little bit better. Open the evaluator here at the bottom of the right-side tray, under theming. Then enter an expression or constraint here:

Type in something like some s: State | winner[s, X]. Forge should give you either #t (for true) or #f (for false) depending on whether the game shows X winning the game.

Optimizing

You might notice that this model takes a while to run, after we start reasoning about full games. Why might that be? Let's re-examine our bounds and see if there's anything we can adjust. In particular, here's what the evaluator says we've got for integers:

Wow---wait, do we really need to be able to count up to 7 for this model? Probably not. If we change our integer bounds to 3 Int we'll still be able to use 0, 1, and 2, and the Platonic search space is much smaller; Forge takes under 3 seconds now on my laptop.

2023.5: Intro to Modeling (Part 4, FAQ)

(Brown CSCI 1710) Logistics

We'll be starting in-class exercises soon. These are graded for participation, but are a significant part of your grade.

Today we're going to answer some anticipated questions Then we'll move on to more modeling in tic-tac-toe.

For Brown CSCI 1710

You might have seen on your homework that we're asking you to generate a family tree demonstrating how someone can be their own "grandparent", without also being their own "ancestor". This might, quite reasonably, sound contradictory.

One of the tricky things about modeling is that often you're trying to formalize something informal. Is it possible that one of those statements is about biology, and one is a social definition, which might be a bit more broad?

A Visual Sketch of How Forge Searches

There are infinitely many potential family tree instances based on the model you're working on in this week's homework. But Forge needs to work with a finite search space, which is where the bounds of a run come in; they limit the set of instances Forge will even consider. Constraints you write are not yet involved at this point.

Once the bounded search space has been established, Forge uses the constraints you write to find satisfying instances within the bounded search space.

This is one reason why the compiler is a bit less smart than we'd like. The engine uses bounds and constraints very differently, and inferring constraints is often less efficient than inferring bounds. But the engine treats them differently, which means sometimes the distinction leaks through.

"Nulls" in Forge

In Forge, there is a special value called none. It's analogous (but not exactly the same) as a null in languages like Java.

Suppose I added this predicate to our run command in the tic-tac-toe model:

pred myIdea {
    all row1, col1, row2, col2: Int | 
        (row1 != row2 or col1 != col2) implies
            Game.initialState.board[row1][col1] != 
            Game.initialState.board[row2][col2]
}

I'm trying to express that every entry in the board is different. This should easily be true about the initial board, as there are no pieces there.

For context, recall that we had defined a Game sig earlier:

one sig Game {
  initialState: one State,
  nextState: pfunc State -> State
}

What do you think would happen?

Think (or try it in Forge) then click!

It's very likely this predicate would be unsatisfiable, given the constraints on the initial state. Why?

Because none equals itself! We can check this:

test expect {
    nullity: {none != none} is unsat
}

Thus, when you're writing constraints like the above, you need to watch out for none: the value for every cell in the initial board is equal to the value for every other cell!

Reachability and none

The none value in Forge has at least one more subtlety: none is "reachable" from everything if you're using the built-in reachable helper predicate. That has an impact even if we don't use none explicitly. If I write something like: reachable[p.spouse, Nim, parent1, parent2] I'm asking whether, for some person p, their spouse is an ancestor of Nim. If p doesn't have a spouse, then p.spouse is none, and so this predicate would yield true for p.

Some as a Quantifier Versus Some as a Multiplicity

The keyword some is used in 2 different ways in Forge:

it's a quantifier, as in some b: Board, p: Player | winner[s, p], which says that somebody has won in some board; and
it's a multiplicity operator, as in some Game.initialState.board[1][1], which says that that cell of the initial board is populated.

Implies vs. Such That

You can read some row : Int | ... as "There exists some integer row such that ...". The transliteration isn't quite as nice for all; it's better to read all row : Int | ... as "In all integer rows, it holds that ...".

If you want to further restrict the values used in an all, you'd use implies. But if you want to add additional requirements for a some, you'd use and. Here are 2 examples:

All: "Everybody who has a parent1 doesn't also have that person as their parent2": all p: Person | some p.parent1 implies p.parent1 != p.parent2.
Some: "There exists someone who has a parent1 and a spouse": some p: Person | some p.parent1 and some p.spouse.

Technical aside: The type designation on the variable can be interpreted as having a character similar to these add-ons: and (for some) and implies (for all). E.g., "there exists some row such that row is an integer and ...", or "In all rows, if row is an integer, it holds that...".

There Exists `some` Atom vs. Some Instance

Forge searches for instances that satisfy the constraints you give it. Every run in Forge is about satisfiability; answering the question "Does there exist an instance, such that...".

Crucially, you cannot write a Forge constraint that quantifies over instances themselves. You can ask Forge "does there exist an instance such that...", which is pretty flexible on its own. E.g., if you want to check that something holds of all instances, you can ask Forge to find counterexamples. This is how assert ... is necessary for ... is implemented, and how the examples from last week worked.

One Versus Some

The one quantifier is for saying "there exists a UNIQUE ...". As a result, there are hidden constraints embedded into its use. one x: A | myPred[x] really means, roughly, some x: A | myPred[x] and all x2: A | not myPred[x]. This means that interleaving one with other quantifiers can be subtle; for that reason, we won't use it except for very simple constraints.

If you use quantifiers other than some and all, beware. They're convenient, but various issues can arise.

Testing Predicate Equivalence

Checking whether or not two predicates are equivalent is the core of quite a few Forge applications---and a great debugging technique sometimes.

How do you do it? Like this:

pred myPred1 {
    some i1, i2: Int | i1 = i2
}
pred myPred2 {
    not all i2, i1: Int | i1 != i2
}
assert myPred1 is necessary for myPred2
assert myPred2 is necessary for myPred1

If you get an instance where the two predicates aren't equivalent, you can use the Sterling evaluator to find out why. Try different subexpressions, discover which is producing an unexpected result! E.g., if we had written (forgetting the not):

pred myPred2 {
    all i2, i1: Int | i1 != i2
}

One of the assertions would fail, yielding an instance in Sterling you could use the evaluator with.

Old Testing Syntax and a Lesson

Alternatively, using the older test expect syntax works too. I'm not going to use this syntax in class if I don't need to, because it's less intentional. But using it here lets me highlight a common conceptual issue with Forge in general.

test expect {
    -- correct: "no counterexample exists"
    p1eqp2_A: {
        not (myPred1 iff myPred2)        
    } is unsat
    -- incorrect: "it's possible to satisfy what i think always holds"
    p1eqp2_B: {
        myPred1 iff myPred2
    } is sat

}

These two tests do not express the same thing! One asks Forge to find an instance where the predicates are not equivalent (this is what we want). The other asks Forge to find an instance where they are equivalent (this is what we're hoping holds for any instance, not just one)!

Use assert

We encourage using assert whenever possible in practice. We'll show you soon how to use assert with predicates that take arguments, etc.

2023.7: Finite Traces: Doing Nothing (Productively)

THESE NOTES ARE BEING MODIFIED UP UNTIL CLASS; EXPECT CHANGES

How to contextualize homeworks

The Forge homeworks have at least three learning goals, each of which represents a different layer of the course content:

getting you practice with the low-level details of using Forge (e.g., predicates, sigs, etc.);
familiarizing you with a number of modeling techniques (e.g., transition predicates, traces, assertions and examples, etc.); and
giving you an opportunity to experiment with and critique different modeling choices (e.g., should this constraint be included in that predicate? What should we decide a well-formed state looks like? Is this set of sigs and fields right for our modeling goals? What should we interpret this English sentence as asking for?)

All of these will be vital when we get to the self-directed projects (and the first is coming fairly soon).

Today we'll return to tic-tac-toe to do more advanced modeling with traces. We'll also use asserts more generally.

Back To Tic-Tac-Toe: Ending Games

When we stopped last time, we'd written this run command:

run {
  wellformed
  traces
} for exactly 10 State for {next is linear}

Reminder: Without a run, an example, or a similar command, running a Forge model will do nothing.

From this run command, Forge will find traces of the system (here, games of Tic-Tac-Toe) represented as a linear sequence of exactly 10 State objects.

Do you have any worries about the way this is set up?

Think, then click!

Are all Tic-Tac-Toe games 10 states long?

Well, maybe; it depends on how we define a game. If we want a game to stop as soon as nobody can win, our exactly 10 State bound is going to prevent us from finding games that are won before the final cell of the board is filled.

Let's add the following guard constraint to the move transition, which forces games to end as soon as somebody wins.

all p: Player | not winner[pre, p]

Now we've got problems. Normally we could fix the issue by getting rid of the exactly. Unfortunately, there's a hidden snag: when we tell Forge in this way that that next is linear, it automatically makes the bound exact.

This behavior, which I admit is bizarre at first, exists for two reasons:

historical reasons: we inherit is linear from Alloy, which uses somewhat different syntax to mean the same thing; and
performance reasons: since the is linear annotation is almost always used for trace-generation, and trace-generation solving time grows (in the worst case) exponentially in the length of the trace, we will almost always want to reduce unnecessary uncertainty. Forcing the trace length to always be the same reduces the load on the solver, and makes trace-generation somewhat more efficient.

But now we need to work around that constraint. Any ideas? Hint: do we need to have only one kind of transition in our system?

Think, then click!

No. A common way to allow trace length to vary is by adding a "do nothing" transition. (In the literature, this is called a stutter transition.)

The trick is in how to add it without also allowing a "game" to consist of nobody doing anything. To do that requires some more careful modeling.

Let's add an additional transition that does nothing. We can't "do nothing" in the predicate body, though -- an empty predicate body would just mean anything could happen. What we mean to say is that the state of the board remains the same, even if the before and after State objects differ.

pred doNothing[pre: State, post: State] {
    all row2: Int, col2: Int | 
        post.board[row2][col2] = pre.board[row2][col2]
}

We also need to edit the traces predicate to allow doNothing to take place:

pred traces {
    -- The trace starts with an initial state
    starting[Game.initialState]
    no sprev: State | sprev.next = Game.initialState
    -- Every transition is a valid move
    all s: State | some Game.next[s] implies {
      some row, col: Int, p: Player | {
        move[s, row, col, p, Game.next[s]] 
      }
      or
      doNothing[s, Game.next[s]]      
    } 
}

As it stands, this fix solves the overconstraint problem of never seeing an early win, but introduces a new underconstraint problem: we don't want doNothing transitions to happen just anywhere!

Here's how I like to fix it:

gameOver[s: State] {
  some p: Player | winner[s, p]
}

Why a new predicate? Because I want to use different predicates to represent different concepts.

When should a doNothing transition be possible? When the game is over!

pred doNothing[pre: State, post: State] {
    gameOver[pre] -- guard of the transition
    pre.board = post.board -- effect of the transition
}

If we wanted to, we could add a not gameOver[pre] to the move predicate, enforcing that nobody can move at all after someone has won.

Do The Rules Allow Cheating?

Let's ask Forge whether a cheating state is possible under the rules.

run {
  wellformed
  traces
  some bad: State | cheating[bad]
} for exactly 10 State for {next is linear}

This should work---assuming we don't drop the is linear annotation. Without it, nothing says that every state must be in the trace, and so Forge could produce an instance with an "unused" cheating state.

Checking Conjectures

When I was very small, I thought that moving in the middle of the board would guarantee a win at Tic-Tac-Toe. Now I know that isn't true. But could I have used Forge to check my conjecture (if I'd taken 1710 at that point in life)?

Think, then Click!

Here's how I did it:

run {
  wellformed
  traces
  -- "let" just lets us locally define an expression
  --  good for clarity in the model!
  -- here we say that X first moved in the middle
  let second = Trace.next[Trace.initialState] |
    second.board[1][1] = X
  -- ...but X didn't win
  all s: State | not winner[s, X]
} for exactly 10 State for {next is linear}

We could also write this using an assertion (which would fail):

pred moveInMiddle { 
  wellformed
  traces
  let second = Trace.next[Trace.initialState] |
    second.board[1][1] = X
}
pred xWins {
  all s: State | not winner[s, X]
}
assert moveInMiddle is sufficient for xWins 
  for exactly 10 State for {next is linear}

You might wonder how assert can be used for predicates that take arguments. For example, suppose we had defined wellformed to take a board, rather than quantifying over all boards in its body. The assert syntax can take (one layer of) quantification. Would move preserve wellformed-ness?

Here's how we'd write that. Notice we don't even need to use the Game here (and thus don't need to give the is linear annotation)! We're just asking Forge about 2 boards at a time:

pred someMoveFromWF[pre, post: Board] { 
  wellformed[pre]
  some r, c: Int, p: Player | move[pre, r, c, p, post]
}
assert all pre,post: Board | move[pre,post] is sufficient for wellformed[post]

Reminder: The Evaluator

If you're viewing an instance, you can always select the evaluator tray and enter Forge syntax to see what it evaluates to in the instance shown. You can enter both formulas and expressions. We also have the ability to refer to objects in the world directly. E.g., we could try:

all s: State | not winner[s, X]

but also (assuming Board0 is an object):

winner[Board0, X]

Going Further

This illustrates a new class of queries we can ask Forge. Given parties following certain strategies, is it possible to find a trace where one strategy fails to succeed vs. another?

Challenge exercise: Write a run that searches for a game where both parties always block immediate wins by their opponent. Is it ever possible for one party to win, if both will act to prevent a 3-in-a-row on the next turn?

Will This Always Work?

Let's say you're checking properties for a real system. A distributed-systems algorithm, maybe, or a new processor. Even a more complex version of Tic-Tac-Toe!

Next time, we'll talk about the problems with traces, which turn out to be central challenges in software and hardware verification.

(Optional) Modeling Tip: Dealing with Unsatisfiability

Many of you have had your first encounter with an over-constraint bug this week. Maybe you wrote an assert and it seemed to never stop. Maybe you wrote a run command and Sterling just produced an UNSAT0 result.

Getting back an unsat result can take a long time. Why? Think of the search process. If there is a satisfying instance, the solver can find it early. If there isn't, the solver needs to explore the entire space of possibilities. Yeah, there are smart algorithms for this, and it's not really enumerating the entire space of instances, but the general idea holds.

So if you run Forge and it doesn't seem to ever terminate, it's not necessary a Forge problem---surprising over-constraint bugs can produce this behavior.

So, how do you debug a problem like this? The first thing I like to do is reduce the bounds (if possible) and, if I still get unsat, I'll use that smaller, faster run to debug. But at that point, we're kind of stuck. UNSAT0 isn't very helpful.

Today I want to show you a very useful technique for discovering the problem. There are more advanced approaches we'll get to later in the course, but for now this one should serve you well.

The core idea is: encode an instance you'd expect to see as a set of constraints, run those constraints only, and then use the evaluator to explore why it fails your other constraints. Let's do an example!

#lang forge/bsl
-- NOT THE EXACT STENCIL!
sig State {
  top: lone Element
}
sig Element {
  next: lone Element             
}

pred buggy {
  all s: State | all e: Element {
    s.top = e or reachable[e, s.top, next]
  }
  some st1, st2: State | st1.top != st2.top     
  all e: Element | not reachable[e, e, next]
}
test expect {
  exampleDebug: {buggy} is sat
}

This test fails. But why?

run {
  some st1, st2: State |
  some ele1, ele2: Element | {
    st1.top = ele1
    st2.top = ele2
    ele1.next = ele2   
    no ele2.next    
  }
} for exactly 2 State, exactly 2 Element

Given this instance, the question is: why didn't Forge accept it? There must be some constraint, or constraints, that it violates. Let's find out which one. We'll paste them into the evaluator...

some st1, st2: State | st1.top != st2.top? This evaluates to #t (true). No problem there.
all s: State | all e: Element { s.top = e or reachable[e, s.top, next] }? This evaluates to #f (false). So this is a problem.

Now we proceed by breaking down the constraint. The outer shell is an all, so let's plug in a concrete value:

all e: Element { State0.top = e or reachable[e, State0.top, next] }? This evaluates to #f. So the constraint fails for State0.

Important: Don't try to name specific states in your model. They don't exist at that point.

Which element does the constraint fail on? Again, we'll substitute concrete values and experiment:

State0.top = Element0 or reachable[Element0, State0.top, next]? This evaluates to #t. What about State0.top = Element1 or reachable[Element1, State0.top, next]?

Following this process very often leads to discovering an over-constraint bug, or a misconception the author had about the goals of the model or the meaning of the constraints.

Question: What's the problem here?

Think, then click!

Since the next field never changes with time, the all constraint doesn't allow states to vary the top of the stack. Instead, we need a weaker constraint to enforce that the stack is shaped like a state.

2023.8: Finite Traces: Problems With Traces

CSCI 1710

Good job on Intro to Modeling! Your next homework will be released today. It's about two things:

exploring the design space of physical keys; and
pooling knowledge to hack them, with a little help from Forge.

After that homework, your curiosity modeling assignment will begin! Think of this like a miniature final project: you'll use Forge to model something you're interested in, on a relatively small scale. We like this to be a partner project if possible, so it's worth finding partners and thinking about ideas now. If you don't know anyone in the class, we'll provide some help.

Today, we'll focus on:

testing with examples and assertions;
some of the limitations of full-trace verification; and
trying another approach to verification that doesn't require generating full traces.

Reminder about Examples

Where an assert or run is about checking satisfiability or unsatisfiability of some set of constraints, an example is about whether a specific instance satisfies a given predicate. This style of test can be extremely useful for checking that (e.g.) small helper predicates do what you expect.

Why use example at all? A couple of reasons:

It is often much more convenient (once you get past the odd syntax) than adding one sigs or some quantification for every object in the instance, provided you're trying to describe an instance rather than a property that defines a set of them---which becomes a better option as models become more complex.
Because of how it's compiled, an example can sometimes run faster than a constraint-based equivalent.

You may be wondering whether there's a way to leverage that same speedup in a run command. Yes, there is! But for now, let's get used to the syntax just for writing examples. Here are some, well, examples:

pred someXTurn {some s: State | XTurn[s]}
example emptyBoardXturn is {someXTurn} for {
  State = `State0
  no `State0.board
}

Here, we've said that there is one state in the instance, and its board field has no entries. We could have also just written no board, and it would have worked the same.

-- You need to define all the sigs that you'll use values from
pred someOTurn {some b: Board | OTurn[b]}
example xMiddleOturn is {someOTurn} for {
  Board = `Board0
  Player = `X0 + `O0
  X = `X0
  O = `O0
  `Board0.board =  (1, 1) -> `X0
}

What about assertions, though? You can think of assertions as generalizing examples. I could have written something like this:

pred someXTurn {some b:Board | xturn[b]}
pred emptySingleBoard {
  one b: Board | true
  all b: Board, r,c: Int | no b.board[r][c]
}
assert emptySingleBoard is sufficient for someXTurn

That's pretty coarse-grained, though. So let's write it in a better way:

pred emptyBoard[b: Board] { all r, c: Int | no b.board[r][c] }
assert all b: Board | emptyBoard[b] is sufficient for xturn[b]

Notice how, by adding variables to the assertion, we're able to write less-verbose assertions and re-use our predicates better.

But if examples are faster, why use assertions?

First, examples aren't always faster. There are also some models we'll write later where example isn't supported. And, of course, as the model becomes more complex, the example becomes longer and longer as you try to define the value of all fields.

But there's a more important reason: assertions can express properties. Because they can state arbitrary constraints, there's an analogy to property-based testing: where examples are like traditional unit tests, assertions are like the checker predicates you wrote in Hypothesis.

So there's a role for both of them.

Traces: Good and Bad

When we stopped last time, we'd finished our model of tic-tac-toe. We could generate a full game of up to 10 board states, and reason about what was possible in any game.

This works great for tic-tac-toe, and also in many other real verification settings. But there's a huge problem ahead. Think about verifying properties about a more complex system---one that didn't always stop after at most 9 steps. If we want to confirm that some bad condition can never be reached, how long a trace do we need to check?

Think, then click!

What's the longest (simple--i.e., no cycles) path in the transition system? That's the trace length we'd need.

That's potentially a lot of states in a trace. Hundreds, thousands, billions, ... So is this entire approach doomed from the start?

Note two things:

Often there are "shallow" bugs that can be encountered in only a few steps. In something like a protocol or algorithm, scaling to traces of length 10 or 20 can still find real bugs and increase confidence in correctness.
There's more than one way to verify. This wasn't the only technique we used to check properties of tic-tac-toe; let's look deeper at something we saw awhile back.

Proving Preservation Inductively

Let's turn to a programming problem. Suppose that we've just been asked to write the add method for a linked list class in Java. The code involves a start reference to the first node in the list, and every node has a next reference (which may be null).

Here's what we hope is a property of linked lists: the last node of a non-empty list always has null as its value for next.

How can we prove that our add method preserves this property, without generating traces of ever-increasing length? There's no limit to how long the list might get, and so the length of the longest path in the transition system is infinite: 0 nodes, 1 node, 2 nodes, 3 nodes,...

This might not be immediately obvious. After all, it's not as simple as asking Forge to run all s: State | last.next = none. (Why not?)

Think, then click!

Because that would just be asking Forge to find us instances full of good states. Really, we want a sort of higher-level all, something that says: "for all runs of the system, it's impossible for the run to contain a bad linked-list state.

This simple example illustrates a central challenge in software and hardware verification. Given a discrete-event model of a system, how can we check whether all reachable states satisfy some property? In your other courses, you might have heard properties like this called invariants.

One way to solve the problem without the limitation of bounded-length traces goes something like this:

Step 1: Ask Forge whether any starting states are bad states. If not, then at least we know that executions with no moves obey our invariant. (It's not much, but it's a start---and it's easy for Forge to check.)
Step 2: Ask Forge whether it's possible, in any good state, to transition to a bad state.

Consider what it means if both checks pass. We'd know that runs of length $0$ cannot involve a bad state. And since we know that good states can't transition to bad states, runs of length $1$ can't involve bad states either. And for the same reason, runs of length $2$ can't involve bad states, nor games of length $3$ , and so on.

How do we write this in Forge?

Not just Forge, but any other solver-based tool, including those used in industry!

Modeling linked lists in Forge is very doable, but more complicated than I'd like to try to do in 10 minutes of class. So let's do this with the tic-tac-toe model for today. A balanced state is a good state.

This assertion checks for counterexamples to the first component: are there any bad states that are also starting states?

assert all b: Board | initial[b] is sufficient for balanced[b]
  for 1 Board, 3 Int

Notice that we didn't need to use the next is linear annotation, because we're not asking for traces at all. We've also limited our scope to exactly 1 Board. We also don't need 4 integer bits; 3 suffices. This should be quite efficient. It should also pass, because the empty board isn't unbalanced.

Now we can ask: are there any transitions from a good state to a bad state? Again, we only need 2 boards for this to make sense.

pred moveFromBalanced[pre: Board, row, col: Int, p: Player, post: board] {
  balanced[pre]
  move[pre, row, col, p, post]
}
assert all pre, post: Board, row, col: Int, p: Player | 
  moveFromBalanced[pre, row, col, p, post] is sufficient for balanced[post]
    for 2 Board, 3 Int

If both of these pass, we've just shown that bad states are impossible to reach via valid moves of the system. Does this technique always work? We'll find out next time.

Aside: Performance

That second step is still pretty slow on my laptop: around 10 or 11 seconds to yield UNSAT. Can we give the solver any help? Hint: is the set of possible values for pre bigger than it really needs to be?

Think, then click!

If we assume the pre board is well-formed, we'll exclude transitions involving invalid boards. There are still a lot of these, even at 3 Int, since row and column indexes will range from -4 to 3 (inclusive).

But is it really safe to assert wellformed[pre]? Good question! Is there a way we could check?

2023.10,11: Counterexamples To Induction

You'll need this exercise template for these notes, which cover 3 class sessions. The completed version, complete with in-class comments, is here.

We prototyped some confidence tests about the reachable predicate (livecode file).

CSCI 1710: curiosity modeling ideas

Curiosity modeling signups are going out soon. Read over others' ideas! Post your own!

CSCI 1710: endorsed Ed posts

I endorse public posts where there's an answer I think could be broadly useful. For example, these (as of today) are useful for physical keys:

Forge Performance

Some of you encountered bad Forge performance in the n-queens lab. I think it's useful to discuss the problem briefly. Forge works by converting your model into a boolean satisfiability problem. That is, it builds a boolean circuit where inputs making the circuit true satisfy your model. But boolean circuits don't understand quantifiers, and so it needs to compile them out.

The compiler has a lot of clever tricks to make this fast, and we'll talk about some of them around mid-semester. But if it can't apply those tricks, it uses a basic idea: an all is just a big and, and a some is just a big or. And this very simply conversion process increases the size of the circuit exponentially in the depth of quantification.

Here is a perfectly reasonable and correct way to approach part of this week's lab:

pred notAttacking {
  all disj q1, q2 : Queen | {
    some r1, c1, r2, c2: Int | {
      // ...
    } } }

The problem is: there are 8 queens, and 16 integers. It turns out this is a pathological case for the compiler, and it runs for a really long time. In fact, it runs for a long time even if we reduce the scope to 4 queens. The default verbosity option shows the blowup here, in "translation":

:stats ((size-variables 410425) (size-clauses 617523) (size-primary 1028) (time-translation 18770) (time-solving 184) (time-building 40)) :metadata ())
#vars: (size-variables 410425); #primary: (size-primary 1028); #clauses: (size-clauses 617523)        
Transl (ms): (time-translation 18770); Solving (ms): (time-solving 184)

The time-translation figure is the number of milliseconds used to convert the model to a boolean circuit. Ouch!

Instead, we might try a different approach that uses fewer quantifiers. In fact, we can write the constraint without referring to specific queens at all -- just 4 integers.

How?

Does the identity of the queens matter at all, if they are in different squares?

If you encounter bad performance from Forge, this sort of branching blowup is a common cause, and can often be fixed by reducing quantifier nesting, or by narrowing the scope of what's being quantified over.

Induction

When we're talking about whether or not a reachable state violates a desirable property $P$ (recall we sometimes say that if this holds, $P$ is an invariant of the system), it's useful to think geometrically. Here's a picture of the space of all states, with the cluster of "good" states separated from the "bad":

If this space is large, we probably can't use trace-finding to get a real proof: we'd have to either reduce the trace length (in which case, maybe there's a bad state just barely out of reach of that length) or we'd be waiting until the sun expands and engulfs the earth.

Traces are still useful, especially for finding shallow bugs, and the technique is used in industry! But we need more than one tool in our bag of tricks.

Step 1: Initiation or Base Case

Let's break the problem down. What if we just consider reachability for traces of length $0$ ---that is, traces of only one state, an initial state?

This we can check in Forge just by asking for a state s satisfying {initial[s] and wellformed[s] and not P[s]}. There's no exponential blowup with trace length since the transition predicates are never even involved! If we see something like this:

We know that at least the starting states are good. If instead there was a region of the starting states that overlapped the bad states, then we immediately know that the property isn't invariant.

Step 1.5: Noticing and Wondering

We can surely also check whether there are bad states within $1$ transition. We're using the transition predicate (or predicates) now, but only once. Forge can also do this; we ask for a pair of states s0, s1 satisfying {initial[s0] and someTransition[s0, s1] and not P[s1]} (where someTransition is my shorthand for allowing any transition predicate to work; we could write the predicate ourselves).

If Forge doesn't find any way for the second state to violate $P$ , it means we have a picture like this:

It's looking promising! Note that in general, there might be overlap (as shown) between the set of possible initial states and the set of possible second states. (For example, imagine if we allowed a doNothing transition at any time).

If we keep following this process exactly, we'll arrive back at the trace-based approach: a set of 3rd states, a set of 4th states, and so on. That's sometimes useful (and people do it in industry---if you're interested, you can see the original paper by Biere, et al.) but it won't scale to systems with very long traces.

Sometimes great ideas arise from dire limitations. What if we limit ourselves to only ever asking Forge for these two state examples? That would solve the exponential-blowup problem of traces, but how can we ever get something useful, that is, a result that isn't limited to trace length 1?

I claim that we can (often) use these small, efficient queries to show that $P$ holds at any finite length from a starting state. But how?

By no longer caring whether the pre-state of the check is reachable or not.

Step 2: Consecution or Inductive Case

We'll ask Forge whether {P[s0] and someTransition[s0, s1] and not P[s1]} is satisfiable for any pair of states. Just so long as the pre-state satisfies $P$ and the post-state doesn't. We're asking Forge if it can find a transition that looks like this:

If the answer is no, then it is simply impossible (up to the bounds we gave Forge) for any transition predicate to stop property $P$ from holding: if it holds in the pre-state, it must hold in the post-state.

But if that's true, and we know that all initial states satisfy $P$ , then all states reachable in $1$ transition satisfy $P$ (by what we just checked). And if that's true, then all states reachable in $2$ transitions satisfy $P$ also (since all potential pre-states must satisfy $P$ ). And so on: any state that's reachable in a finite number of transitions must satisfy $P$ .

If you've seen "proof by induction" before in another class, we've just applied the same idea here. Except, rather than using it to show that the sum of the numbers from $1$ to $n$ is $\frac{k ( k + 1 )}{2}$ , we've just used it to prove that $P$ is invariant in our system.

In Tic-Tac-Toe, this would be something like "cheating states can't be reached with legal moves". In an operating system, this might be "two processes can never modify a block of memory simultaneously". In hardware, it might be "only one device has access to write to the bus at any point". In a model of binary search, it might be "if the target is in the array, it's located between the low and high indexes".

For most computer-scientists, I think that this feels like a far more relatable and immediately useful example of the induction principle. That's not to dismiss mathematical induction! I quite like it (and it's useful for establishing some useful results related to Forge). But multiple perspectives enrich life.

Exercise: Try it!

In the binary-search model, there's a pair of tests labeled initStep and inductiveStep, under a comment that says "Exercise 1". Fill these in using the logic above, and run them. Do they both pass? Does one fail?

What if Forge finds a counterexample?

What if Forge does find a transition that fails to preserve our property $P$ ? Does it mean that $P$ is not an invariant of the system?

Think, then click!

No! It just means that $P$ isn't inductively invariant. The pre-state that Forge finds might not itself be reachable!

This technique is a great way to quickly show that $P$ is invariant, but if it fails, we need to do more work.

We see this happen when we try to run the above checks for our binary-search model! The inductiveStep test fails, and we get a counterexample.

Fix 1: Adding Reasonable Assumptions

Sometimes there are conditions about the world that we need in order for the system to work at all. For example, we already require the array to be sorted. Are there other requirements?

It's possible that the counterexample we're shown involves a subtle bug in binary search: if the array is large relative to the maximum integer in the system, the value (high+low) can overflow.

We're not trying to fix this problem in the algorithm. This is a real bug, and modeling found it. So let's log the issue and add a global assumption that the array is not so large. We'll add that in the safeArraySize predicate, which we'll then use in the 2 checks. How can we express what we want---that the array is not too large?

Think, then click!

The core challenge here is that we'll never have enough integers to actually count #Int. However, we can ask for the maximum integer---max[Int]. So we could say that arr.lastIndex is less than divide[max[Int], 2]. This might be a little conservative, but it guarantees that the array is never larger than half of the maximum integer.

Exercise: Try expressing this in the safeArraySize predicate, and adding it to the test that failed. Add a comment explaining why this is a requirement.

Fix 2: Enriching the Invariant

Sometimes the property we're hoping to verify is simply not preserved by system transitions in general, but the pre-state in any counterexample isn't actually reachable. This is the tradeoff of the inductive approach! We have to do some work to make progress.

Concretely, we'll need to verify something stronger. At the very least, if $S$ is the pre-state of the counterexample we were given, we need to also add "and this state isn't $S$ " to the property $P$ we're checking. In practice, we usually add something more general that covers whatever quality it is about $S$ that makes it unreachable.

The technique is sometimes called "enriching the invariant".

Exercise: Do you believe the counterexample you're getting is reachable? If not, what is it about the state that looks suspicious?

Exercise: Now add new constraints to the bsearchInvariant predicate that exclude this narrow criterion. Be careful not to exclude too much! An over-constrained property can be easy to verify, but may not actually give many guarantees.

But Can We Trust The Model?

What would it mean for this verification idea if there were simply no initial states, or no way to take a certain transition? That would probably be a bug in the model; how would it impact our proof?

Look again at the two checks we wrote. If initial were unsatisfiable by any state, surely the Step 1 check would also be unsatisfiable (since it just adds more constraints). Similarly, unsatisfiable transition predicates would limit the power of Step 2 to find ways that the system could transition out of safety. This would mean that our confidence in the check was premature: Forge would find no initial bad states, but only because we narrowed its search to nothing!

This problem is called vacuity, and I'll give you another example. Suppose I told you: "All my billionaire friends love Logic for Systems". I have, as far as I know anyway, no billionaire friends. So is the sentence true or false? If you asked Forge, it would say that the sentence was true---after all, there's no billionaire friend of mine who doesn't love Logic for Systems...

This is a problem you might first hear about in other courses like 0220, or in the philosophy department. There's a risk you'll think vacuity is silly, or possibly a topic for debate among people who like drawing their As upside down and their Es backwards, and love writing formulas with lots of Greek letters in. Don't be fooled! Vacuity is a major problem even in industrial settings like Intel, because verification tools are literal-minded. (Still doubtful? Ask Tim to send you links, and check out the EdStem post at the top of these notes.)

At the very least, we'd better test that the left-hand-side of the implication can be satisfied. This isn't a guarantee of trustworthiness, but it's a start. And it's easy to check with Forge.

Modeling Boolean Logic (Syntax, Semantics, and Sets)

CSCI 1710

Welcome back from long weekend!

Start on Curiosity Modeling if you haven't already. You don't need "approval", but check my advice in the megathread. Post on Ed if you have questions or concerns. I haven't seen many ideas in the thread yet; please do share yours!
Professionalism is important in 1710. If you are unprofessional with your project or case study partner(s), your grade may suffer.

Livecode is here.

Today, we'll be writing a new model from scratch. Across today and Friday, we'll have 3 goals:

distinguishing syntax versus semantics (and what that even means);
introducing sets to Forge; and
learning a way to model recursive concepts in a language without recursion.

In-class exercise

We'll warm up with an in-class exercise. I'd like everyone to take 5 minutes responding to this request for feedback about Toadus Ponens. Feedback here is actionable; e.g., we might be able to make Forge give an overall test report rather than stopping after the first test failure---something that came out of the feedback so far.

Boolean Formulas

You've all worked a lot with boolean formulas before. Any time you write the conditional for an if statement in a programming language, and you need ands and ors and nots, you're constructing a boolean formula. E.g., if you're building a binary search tree, you might write something like:

if(this.getLeftChild()!= null &&
   this.getLeftChild().value < goal) { 
    ... 
}

The conditional inside the if is a boolean formula with two variables (also sometimes called atomic propositions): leftChild != null and leftChild.value < goal. Then a single && (and) combines the two.

We might describe this example in Forge as:

example leftBranchFormula is {} for {
  And = `And0
  Var = `VarNeqNull + `VarLTLeft
  Formula = And + Var
}

Can you think of more examples of boolean formulas?

Modeling Boolean Formulas

Let's define some types for formulas in Froglet. It can be helpful to have an abstract sig to represent all the formula types, and then extend it with each kind of formula:

-- Syntax: formulas
abstract sig Formula {}
-- Forge doesn't allow repeated field names, so manually disambiguate
sig Var extends Formula {} 
sig Not extends Formula {child: one Formula} 
sig Or extends Formula {o_left, o_right: one Formula}
sig And extends Formula {a_left, a_right: one Formula}

If we really wanted to, we could go in and add sig Implies, sig IFF, and other operators. For now, we'll stick with these.

As in the family-trees homework, we need a notion of well-formedness. What would make a formula in an instance "garbage"? Well, if the syntax tree contained a cycle, the formula wouldn't be a formula! We'd like to write a wellformed predicate that excludes something like this:

pred trivialLeftCycle { 
    some a: And | a.a_left = a
}
pred notWellformed { not wellformed }
assert trivialLeftCycle is sufficient for notWellformed

Like (again) in family trees, we've got multiple fields that a cycle could occur on. We don't just need to protect against this basic example, but against cycles that use multiple kinds of field. Let's build a helper predicate:

-- IMPORTANT: remember to update this if adding new fmla types!
pred subFormulaOf[sub: Formula, f: Formula] {
  reachable[sub, f, child, a_left, o_left, a_right, o_right]
}

At first, this might seem like a strange use of a helper---just one line, that's calling reachable. However, what if we need to check this in multiple places, and we want to add more formula types (Implies, say)? Then we need to remember to add the fields of the new sig everywhere that reachable is used. This way, we have one place to make the change.

-- Recall we tend to use wellformed to exclude "garbage" instances
--   analogous to PBT *generator*; stuff we might want to verify 
--   or build a system to enforce goes elsewhere!
pred wellformed {
  -- no cycles
  all f: Formula | not subFormulaOf[f, f]
}

We'll want to add wellformed to the first example we wrote, but it should still pass. Let's run the model and look at some formulas!

run {wellformed}

That's prone to giving very small examples, though, so how about this?

run {
  wellformed
  some top: Formula | {
    all other: Formula | top != other => {
      subFormulaOf[other, top]
    }
  }
} for exactly 8 Formula

Note this is another good use for a subFormulaOf predicate: if we wrote a bunch of tests that looked like this, and then added more, we wouldn't have to remember to add the new field to a bunch of separate places.

`inst` Syntax

The syntax that you use in examples can be used more generally. We can define a partial instance of our own using the inst command. We can then provide the instance to run, tests, and other commands along with numeric bounds. This is sometimes great for performance optimization.

For example:

inst onlyOneAnd {
  And = `And0  
}

run {
  wellformed
} for exactly 8 Formula for onlyOneAnd

Compare the statistical info for the run with and without this added partial instance information. Do you notice any changes? What do you think might be going on, here?

Think, then click!

The statistical information is reporting runtime, but also something else. It turns out these express how big the boolean constraint problem is.

Last time we started modeling boolean formulas in Forge. We'd defined what a "well-formed" formula was, and then ran Forge to produce an example with run {wellformed}.

That's prone to giving very small examples, though, so how about this?

run {
  wellformed
  some top: Formula | {
    all other: Formula | top != other => {
      subFormulaOf[other, top]
    }
  }
} for exactly 8 Formula

Note this is an example of why we wrote a subFormulaOf predicate: it's convenient for re-use! If we wrote a bunch of tests that looked like this, and then added more, we wouldn't have to remember to add the new field to a bunch of separate places.

Modeling the Meaning Of Boolean Circuits

What's the meaning of a formula? So far they're just bits of syntax. Sure, they're pretty trees, but we haven't defined a way to understand them or interpret them.

This distinction is really important, and occurs everywhere we use a language (natural, programming, modeling, etc.). Let's go back to that BST example:

if(this.getLeftChild() != null && this.getLeftChild().value < goal) { 
    ... 
}

Suppose that the BST class increments a counter whenever getLeftChild() is called. If it's zero before this if statement runs, what will it be afterward?

Think, then click!

It depends! If the left-child is non-null, the counter will hold 2 afterward. but what if the left-child is null?

If we're working in a language like Java, which "short circuits" conditionals, the counter would be 1 since the second branch of the && would never need to execute.

But in another language, one that didn't have short-circuiting conditionals, the counter might be 2.

If we don't know the meaning of that if statement and the and within it, we don't actually know what will happen! Sure, we have an intuition---but when you're learning a new language, experimenting to check your intuition is a good idea. Syntax can mislead us.

So let's understand the meaning, the semantics, of boolean logic. What can I do with a formula? What's it meant to enable?

Think, then click!

If I have a formula, I can plug in various values into its variables, and see whether it evaluates to true or false for those values.

So let's think of a boolean formula like a function from variable valuations to boolean values.

There's many ways to model that, but notice there's a challenge: what does it mean to be a function that maps a variable valuation to a boolean value? Let's make a new sig for that:

-- If we were talking about Forge's underlying boolean 
-- logic, this might represent an instance! More on that soon.
sig Valuation {
  -- [HELP: what do we put here? Read on...]
}

We have to decide what field a Valuation should have. And then, naively, we might start out by writing a recursive predicate or function, kind of like this pseudocode:

pred semantics[f: Formula, val: Valuation] {
  f instanceof Var => val sets the f var true
  f instanceof And => semantics[f.a_left, val] and semantics[f.a_right, val]
  ...
}

This won't work! Forge is not a recursive language. We've got to do something different.

Let's move the recursion into the model itself, by adding a mock-boolean sig:

one sig Yes {}

and then adding a new field to our Formula sig (which we will, shortly, constrain to encode the semantics of formulas):

   satisfiedBy: pfunc Valuation -> Yes

This works but it's a bit verbose. It'd be more clean to just say that every formula has a set of valuations that satisfy it. So I'm going to use this opportunity to start introducing sets in Forge.

Language change!

First, let's change our language from #lang forge/bsl to #lang forge. This gives us a language with more expressive power, but also some subtleties we'll need to address.

Relational Forge

This language is called Relational Forge, for reasons that will become apparent. For now, when you see #lang forge rather than #lang forge/bsl, expect us to say "relational", and understand there's more expressivity there than Froglet gives you.

Adding sets...

Now, we can write:

abstract sig Formula {
  -- Work around the lack of recursion by reifying satisfiability into a field
  -- f.satisfiedBy contains an instance IFF that instance makes f true.
  -- [NEW] set field
  satisfiedBy: set Valuation
}

and also:

sig Valuation {
  trueVars: set Var
}

And we can encode the semantics as a predicate like this:

-- IMPORTANT: remember to update this if adding new fmla types!
-- Beware using this fake-recursion trick in general cases (e.g., graphs)
-- It's safe to use here because the data are tree shaped. 
pred semantics
{
  -- [NEW] set difference
  all f: Not | f.satisfiedBy = Valuation - f.child.satisfiedBy
  -- [NEW] set comprehension, membership
  all f: Var | f.satisfiedBy = {i: Valuation | f in i.trueVars}
  -- [NEW] set union
  all f: Or  | f.satisfiedBy = f.o_left.satisfiedBy + f.o_right.satisfiedBy
  -- [NEW] set intersection
  all f: And | f.satisfiedBy = f.a_left.satisfiedBy & f.a_right.satisfiedBy
}

There's a lot going on here, but if you like the idea of sets in Forge, some of these new ideas might appeal to you. However, there are some Forge semantics questions you might have.

Wait, was that a joke?

No, it actually wasn't! Are you sure that you know the _meaning_ of `=` in Forge now?

Suppose I started explaining Forge's set-operator semantics like so:

Set union (+) in Forge produces a set that contains exactly those elements that are in one or both of the two arguments.
Set intersection (&) in Forge produces a set that contains exactly those elements that are in both of the two arguments.
Set difference (-) in Forge produces a set that contains exactly those elements of the first argument that are not present in the second argument.
Set comprehension ({...}) produces a set containing exactly those elements from the domain that match the condition in the comprehension.

That may sound OK at a high level, but you shouldn't let me get away with just saying that. (Why not?)

Think, then click!

What does "produces a set" mean? And what happens if I use + (or other set operators) to combine a set and another kind of value? And so on...

We're often dismissive of semantics---you'll hear people say, in an argument, "That's just semantics!" (to mean that the other person is being unnecessarily pedantic and quibbling about technicalities, rather than engaging). But especially when we're talking about computer languages, precise definitions matter a lot!

The key idea

Here's the vital high-level idea: in Relational Forge, all values are sets. A singleton value is just a set with one element, and none is the empty set. That's it.

This means that +, &, etc. and even = are well-defined, but that our usual intuitions (sets are different from objects) start to break down when we add sets into the language. This is one of the major reasons we started with Froglet, because otherwise the first month of 1710 is a lot of extra work for people who haven't yet taken 0220, or who are taking it concurrently. Now, everyone is familiar with things like constraints and quantification, and there's less of a learning curve.

From now on, we'll admit that everything in Forge is a set, but introduce the ideas that grow from that fact gradually, resolving potential confusions as we go.

Returning to well-formedness

Now we have a new kind of ill-formed formula: one where the semantics haven't been properly applied. So we enhance our wellformed predicate:

pred wellformed {
  -- no cycles
  all f: Formula | not subFormulaOf[f, f]
  -- the semantics of the logic apply
  semantics
}

Some Validation

Here are some examples of things you might check in the model. Some are validation of the model (e.g., that it's possible to have instances that disagree on which formulas they satisfy) and others are results we might expect after taking a course like 0220.

-- First, some tests for CONSISTENCY. We'll use test-expect/is-sat for these. 
test expect {
  nuancePossible: {
    wellformed
    -- [NEW] set difference in quantifier domain
    --   Question: do we need the "- Var"?
    some f: Formula - Var | {
      some i: Valuation | i not in f.satisfiedBy
      some i: Valuation | i in f.satisfiedBy
    }    
  } for 5 Formula, 2 Valuation is sat  
  ---------------------------------
  doubleNegationPossible: {
    wellformed 
    some f: Not | {
      -- [NEW] set membership (but is it "subset of" or "member of"?)
      f.child in Not      
    }
  } for 3 Formula, 1 Valuation is sat  
} 

-- Now, what are some properties we'd like to check? 
-- We already know a double-negation is possible, so let's write a predicate for it
-- and use it in an assertion. Since it's satisfiable (above) we need not worry 
-- about vacuous truth for this assertion.
pred isDoubleNegationWF[f: Formula] { 
    f in Not  -- this is a Not
    f.child in Not -- that is a double negation
    wellformed
} 
pred equivalent[f1, f2: Formula] {
    f1.satisfiedBy = f2.satisfiedBy
}
-- Note that this check is always with respect to scopes/bounds. "equivalent" isn't 
-- real logical equivalence if we have a bug in our model! 
assert all n: Not | isDoubleNegationWF[n] is sufficient for equivalent[n, n.child.child] 
  for 5 Formula, 4 Valuation is unsat    

-- de Morgan's law says that 
-- !(x and y) is equivalent to (!x or !y) and vice versa. Let's check it. 

-- First, we'll set up a general scenario with constraints. This _could_ also 
-- be expressed with `inst`, but I like using constraints for this sort of thing. 
pred negatedAnd_orOfNotsWF[f1, f2: Formula] {
    wellformed
    
    -- f1 is !(x and y)
    f1 in Not 
    f1.child in And      
    -- f2 is (!x or !y)
    f2 in Or
    f2.o_left in Not
    f2.o_right in Not
    f2.o_left.child = f1.child.a_left
    f2.o_right.child = f1.child.a_right      
}

assert all f1, f2: Formula | 
  negatedAnd_orOfNotsWF[f1, f2] is sufficient for equivalent[f1, f2]
  for 8 Formula, 4 Valuation is unsat      

-- If we're going to trust that assertion passing, we need to confirm 
-- that the left-hand-side is satisfiable! 
test expect {
    negatedAnd_orOfNotsWF_sat: {
        some f1, f2: Formula | negatedAnd_orOfNotsWF[f1, f2]
    } for 8 Formula, 4 Valuation is sat
}

Relations and Reachability

Today we'll work with a fairly simple model, and use it to demonstrate how reachability works in Forge. That is, You'll see how the reachable predicate is defined (via transitive closure), and get some more practice with sets in Forge.

Livecode is here.

Brown CSCI 1710

Hopefully everyone has been making good progress with Curiosity Modeling! I'm seeing a lot of awesome questions on Ed and at hours.
Some of you are curious about how far you need to go to earn a check on the project. This is hard to give a precise answer to, because everyone's models will be different. However, I can say that we're looking for:
- evidence that you explored your topic of choice;
- evidence that you validated your model;
- something you got from the model (generating or solving puzzles, checking a property, or even just understanding a domain better).
There's a lab this week that focuses on sets and other relational concepts. You'll be exploring automated memory management in Forge. This lab leads directly into the next homework, and is meant to give you useful insight into how these systems work.

Reachability

Consider this small Forge model:

#lang forge
sig Person {
    friends: set Person,
    followers: set Person
}
one sig Nim, Tim extends Person {}

pred wellformed {
    -- friends is symmetric
    all disj p1, p2: Person | p1 in p2.friends implies p2 in p1.friends 
    -- cannot follow or friend yourself
    all p: Person | p not in p.friends and p not in p.followers
}
run {wellformed} for exactly 8 Person

Let's run it, get a reasonably large instance, and try to figure out how to express some goals in the evaluator.

EXERCISE: You'll be following along with each of these questions. Use the evaluator heavily---it's great for figuring out what different expressions evaluate to.

Expression: Nim's followers

This is just Nim.followers.

Formula: Nim is reachable from Tim via "followers"

We can use the reachable built-in: reachable[Nim, Tim, followers].

Expression: Nim's followers' followers

Another application of the field! Nim.followers.followers.

But wait, what does this really mean? Since Nim.followers is a set, rather than a Person, should I be able to apply .followers to it?

Formula: Nim is reachable from Tim via the inverse of "followers" (what we might call "following")?

Hmm. This seems harder. We don't have a field called following, and the reachable built-in takes fields!

...it does take fields, right?

Let's try something else.

Formula: Nim is reachable from Tim via followers, but not including Tim's friends?

We might try reachable[Nim, Tim, (followers-Tim.friends)] but this will produce an error. Why? Well, one reason we might give is:

...because followers is a field but Tim.friends is a set.

But is that the real answer? The error complains about "arity": 2 vs 1. Let's type those two expressions into the evaluator and see what we get. For followers, we get a set of pairs of people. But Tim.friends is a set of singletons.

Evaluator Output

The evaluator prints these as parenthesized lists of lists. But don't be fooled! It's really printing a set of lists. The order in which the inner lists print shouldn't matter.

Arity, Relations, and Tuples

Arity is another word for how wide the elements of a set are. Here, we'd say that followers has arity 2 and Tim.friends has arity 1. So Forge is pointing out that taking the set difference of these two makes no sense: you'll never find a singleton in a set of pairs.

When we're talking about sets in this way, we sometimes call them relations. E.g., the followers field is a binary relation because it has arity 2. We'll call elements of relations tuples. E.g., if Nim follows Tim, the tuple (Tim, Nim) would be present in followers.

(Almost) Everything Is A Set

In Relational Forge, reachable doesn't take "fields"; it takes relations. Specifically, binary relations, which define the steps it can use to connect the two objects.

That's the fact we'll use to solve the 2 problems above.

Formula: Nim is reachable from Tim via the inverse of "followers" (what we might call "following")?

Now that we know followers is a binary relation, we can imagine flipping it to get its inverse. How can we do that? Well, there are multiple ways! We could write a set-comprehension:

{p1, p2: Person | p1 in p2.followers}

The order matters here! If p1 is in p2.followers, then there is an entry in the relation that looks like (p2, p1). We could make this more explicit by writing:

{p1, p2: Person | p2->p1 in followers}

Now that we know followers is a set, this makes sense! The product (->) operator combines p2 and p1 into a binary tuple, which may (or may not) be in followers.

Forge provides an operator that does this directly: transpose (~). So we could write instead:

~followers

Which should you use? It's up to you! Regardless, we could now answer this question with:

reachable[Nim, Tim, ~followers]

Formula: Nim is reachable from Tim via followers, but not including Tim's friends?

Everything is a set, so let's build the subset of followers that doesn't involve anyone in Tim.friends. We can't write followers-(Tim.friends), since that's an arity mismatch. Somehow, we need to remove entries in followers involving one of Tim's friends and anybody else.

One way is to use the product operator to build cooresponding binary relations. E.g.:

followers-(Tim.friends->Person)

You may notice that we've now used -> in what seems like 2 different ways. We used it to combine specific people, p1 and p2 above into a single tuple. But now we're using it to combine two sets into a set of tuples. This flexibility is a side effect of sets being the fundamental concept in Relational Forge:

the product of ((Tim)) and ((Nim)) is ((Tim, Nim));
the product of ((Person1), (Person2)) and ((Person3), (Person4)) is ((Person1, Person3), (Person1, Person4), (Person2, Person3), (Person2, Person4)).

Formally, -> is the cross-product operation on sets.

You'll see this apparently double-meaning when using in and =, too: singletons are single-element sets, where the element is a one-column tuple.

How Reachability Works

If we wanted to encode reachability, we could start by writing a helper predicate:

pred reachable2[to: Person, from: Person, via: Person -> Person]: set Person {
    to in 
    from.via +
    from.via.via +
    from.via.via.via 
    -- ...and so on...
}

But we always have to stop writing somewhere. We could write the union out to 20 steps, and still we wouldn't catch some very long (length 21 or higher) paths. So we need some way to talk about unbounded reachability.

Forge's reachable built-in does this, but it's just a facade over a new relational operator: transitive closure (^R).

The transitive closure ^R of a binary relation R is the smallest binary relation such that:

if (x, y) is in R, then (x, y) is in ^R; and
if (x, z) is in R and (z, y) is in ^R then (x, y) is in R.

That is, ^R encodes exactly what we were trying to encode above. The reachable[to, from, f1, ...] built-in is just syntactic sugar for:

    to in from.^(f1 + ...)

`none` is reachable from everything

You might remember that reachable always evaluates to true if we give none as its first argument. This is because of the translation above: if to is the empty set, then it is a subset of anything.

More Sets and Induction (Mutual Exclusion)

Livecode: here.

Modeling A Mutual-Exclusion Protocol

If you have two independent threads of execution running concurrently, many subtle bugs can manifest. For instance, if both threads can write to the same region of memory, they might overlap their writes. A great example of this is simply incrementing a counter. In most computer systems, an operation like:

  counter = counter + 1;

is not atomic by itself. Thus, the following sequence of operations would be problematic:

Thread 1: read the current value of counter
Thread 2: read the current value of counter
Thread 1: add 1 to that value
Thread 1: write the new value to counter
Thread 2: add 1 to that value
Thread 2: write the new value to counter because then the counter's value is only 1 higher than its original value.

We often call the property that such traces can't exist mutual exclusion, and the piece of code that shouldn't ever be run by 2 threads at once the critical section. Today we'll model a very simple approach to mutual-exclusion. (It turns out not to work, but it's one of the first "bad" solutions you'd see in a course like CSCI 1760.)

The idea comes from how we as humans negotiate access to a shared resource like the spoken-communication channel in a meeting. If we want to talk, we raise our hand.

A Simplified Mutual-Exclusion Algorithm

Consider the pseudocode below, and imagine it running on two separate threads of execution. I've marked program locations in square brackets---note how they correspond to the spaces in between lines of code executing.

while(true) { 
    // [state: disinterested]
    this.flag = true;
    // [state: waiting]
    while(other.flag == true);    
    // [state: in-cs]    
    run_critical_section_code(); // we don't care about details
    this.flag = false;    
}

Both processes will always continuously try to access the critical section. When they become interested, they set a public flag bit to true. Then, they don't enter until their counterpart's flag is false. When they're done executing the critical section, they lower their flag and restart.

Notice we aren't modeling the critical section itself. The exact nature of that code doesn't matter for our purposes; we just want to see whether both process can be at the in-cs location at once. If so, mutual exclusion fails!

Modeling

Let's start with a simple model. We'll critique this model as the semester progresses, but this will be useful enough to get us started.

abstract sig Location {}
one sig Uninterested, Waiting, InCS extends Location {}

abstract sig Process {}
one sig ProcessA, ProcessB extends Process {}

sig State {
    loc: func Process -> Location,
    flags: set Process
}

An initial state is one where all processes are uninterested, and no process has raised its flag:

pred init[s: State] {
    all p: Process | s.loc[p] = Uninterested
    no s.flags 
}

We then have three different transition predicates, each corresponding to one of the lines of code above, and a transition predicate delta that represents any currently-possible transition:


pred raise[pre: State, p: Process, post: State] {
    pre.loc[p] = Uninterested
    post.loc[p] = Waiting
    post.flags = pre.flags + p
    all p2: Process - p | post.loc[p2] = pre.loc[p2]
}

pred enter[pre: State, p: Process, post: State] {
    pre.loc[p] = Waiting 
    pre.flags in p -- no other processes have their flag raised
    post.loc[p] = InCS    
    post.flags = pre.flags
    all p2: Process - p | post.loc[p2] = pre.loc[p2]
}

pred leave[pre: State, p: Process, post: State] {
    pre.loc[p] = InCS    
    post.loc[p] = Uninterested    
    post.flags = pre.flags - p
    all p2: Process - p | post.loc[p2] = pre.loc[p2]
}

-- the keyword "transition" is reserved
pred delta[pre: State, post: State] {
    some p: Process | 
        raise[pre, p, post] or
        enter[pre, p, post] or 
        leave[pre, p, post]
}

We won't create a Trace sig or traces predicate at all. We're going to do everything with induction today!

Model Validation

We should do some quick validation at this point. The most basic would be checking that each of our transitions is satisfiable:

test expect {
    canEnter: {        
        some p: Process, pre, post: State | enter[pre, p, post]        
    } is sat
    canRaise: {        
        some p: Process, pre, post: State | raise[pre, p, post]        
    } is sat    
    canLeave: {        
        some p: Process, pre, post: State | leave[pre, p, post]        
    } is sat    
}

In a real modeling situation, we would absolutely add more checks. In fact, we might regret not testing more...

Does Mutual Exclusion Hold?

Before we run Forge, ask yourself whether the algorithm above guarantees mutual exclusion. (It certainly has another kind of error, but we'll get to that later. Focus on this one property.)

It seems reasonable that the property holds. But if we try to use the inductive approach to prove that:

pred good[s: State] {
    #{p: Process | s.loc[p] = InCS} <= 1
}

pred startGoodTransition[s1, s2: State] {    
    good[s1]
    delta[s1,s2]
}
assert all s: State | init[s] is sufficient for good[s] for exactly 1 State
assert all pre, post: State | startGoodTransition[pre, post] is sufficient for good[post] for exactly 2 State
    
The inductive case _fails_. Let's see what the counterexample is:

```alloy
    run {
      not {
        all pre, post: State | 
          delta[pre, post] and good[pre] implies good[post]
        }
    } for exactly 2 State

Yields:

or, in the table view:

Notice that neither process has raised its flag in either state. This seems suspicious, and might remind you of the binary-search model from before long weekend. This is another such situation.

Refresher: Enriching The Invariant

This counterexample shows that the property we wrote isn't inductive. But it might still an invariant of the system---it's just that Forge has found an unreachable prestate. To prevent that, we'll add more conditions to the good predicate (recall: we call this enriching the invariant; it's a great demo of something apparently paradoxical: proving something stronger can be easier). Let's say that, in order for a process to be in the critical section, its flag needs to be true:

pred good2[s: State] {
    all p: Process | s.loc[p] = InCS implies p in s.flags        
    #{p: Process | s.loc[p] = InCS} <= 1        
}

But the inductive case still fails! Let's enrich again. The flag also has to be raised if a process is waiting for its counterpart's flag to become false:

pred good3[s: State] {
    all p: Process | (s.loc[p] = InCS or s.loc[p] = Waiting) implies p in s.flags    
    #{p: Process | s.loc[p] = InCS} <= 1        
}

What are changing?

Notice again that we're only strengthening the thing we're trying to prove---not altering the model itself in any way.

At this point, the inductive check passes. We've just shown that this algorithm satisfies mutual exclusion.

Validating the Check

We should probably make sure the two proof steps (the base case and the inductive step) aren't passing vacuously:

test expect {
    baseCaseVacuity: {
        some s: State | init[s] and good1[s]
    } for exactly 1 State is sat
    
    inductiveCaseVacuity: {
        some pre, post: State | 
                delta[pre, post] and good3[pre]
    } for exactly 2 State is sat

}

Fortunately, these both pass.

In-Class Exercise

The link is here.

Going Beyond Assertions

CSCI 1710: Homeworks

I'm super happy with the curiosity-modeling work I've seen overall!

Forge 3 goes out today. Forge 3 is a continuation of the garbage-collection lab; we ask you to model more advanced kinds of garbage collection that are actually able to collect unreachable memory, rather than just memory with a 0 reference count. Many of you won't have seen these algorithms yet. That's OK---whether or not you've seen them, you can learn useful things about them via modeling.

CSCI 1710: Toadus Ponens

We'll start class today with a presentation on Toadus Ponens on the upcoming homework. Please see the lecture capture for the details; Siddhartha Prasad will be showing the tool in more detail.

Setting The Stage: States and Reachability

Last time we were modeling this simplified (and perhaps buggy) mutual-exclusion protocol:

while(true) { 
     // [location: uninterested]
    this.flag = true;  // visible to other threads!
    //  [location: waiting]
    while(other.flag == true);    
    //  [location: in-cs] // "critical section"   
    this.flag = false;    
}

Aside: I'm going to use the terms "process" and "thread" interchangably for this model. The difference is vital when programming, but not important for our purposes today.

If there are 3 locations, and 2 flag values, then every process has $3 \times 2 = 6$ possible states. If 2 processes are executing this loop, there are $6^{2} = 6 \times 6 = 36$ possible states overall in the system.

Our mutual exclusion property, which says that at most one process can be running the critical section at a time, is a statement that 4 specific states are unreachable: the 4 where both processes are in the critical-section location (with any possible combination of boolean flags).

That property wasn't "inductive": Forge could find transitions with a good prestate that end in one of those 4 bad states. So we enriched the invariant to also say that any thread in the waiting or critical-section locations must also have a raised flag. This prevented Forge from using many prestates it could use before: $(I n CS, Wai t in g, 0, 0)$ , for example.

Today, we're going to do two things:

build intuition for how the above actually worked; and
talk about how we could approch verifying other, richer, kinds of property.

Drawing The Picture

I really don't want to draw 36 states on the board, along with all their corresponding transition arcs. But maybe I don't need to. Let's agree that there are, in principle, 36 states, but just draw the part of the system that's reachable.

We'll start with the initial state: $(D i s, D i s, 0, 0)$ and abbrieviate location tags to make writing them convenient for us: $D i s$ for "disinterested", $CS$ for "critical section", and $W$ for "waiting".

Fill in the rest of the reachable states and transitions; don't add unreachable states at all. You should find the picture is significantly smaller than it would be if we had drawn all states.

Keep going! In diagrams like this, where there are only 2 processes, I like to split the state and draw the transition arcs for each process moving separately in different directions. (We're assuming, for now, that only one process moves at a time, even though they are executing concurrently.)

I've marked the inability of a process to make progress with an "X"; it's a transition that can't be taken.

By drawing the entire graph of reachable states, we can see that the "bad" states are not reachable. This protocol satisfies mutual exclusion.

Other Properties

Just mutual exclusion isn't good enough! After all, a protocol that never gave access to the critical section would guarantee mutual exclusion. We need at least one other property, one that might turn out to be more complex. We'll get there in 2 steps.

Property: Deadlock Freedom

If, at some point, nobody can make progress, then surely the protocol isn't working. Both processes would be waiting forever, unable to ever actually get work done.

A state where no process can transition is called a deadlock state. Verifying that a system is free of deadlocks is a common verification goal.

Does the system above satisfy deadlock-freedom? Can we check it using only the diagram we produced and our eyes?

No. The state

(W, W, 1, 1)

is reachable, but has no exit transitions: neither thread can make progress.

We can see that by doing a visual depth-first (or breadth-first) search of the sub-system for the reachable states.

This kind of verification problem---checking properties of a transition system---is called model checking. Interestingly, there are other kinds of verification tools that use this graph-search approach, rather than the logic- and solver-based approach that Forge uses; you'll hear these tools referred to as explicit-state model checkers and symbolic model checkers respectively.

Question: How could we check for deadlocks using just the graph we drew and our eyes?

Think, then click!

In the same way we looked for a failure of mutual exclusion. We seek a reachable state with _no_ transitions out.

In this case, we find such a state.

Question: How could we check for deadlock in Forge?

We could either try the inductive approach, or use the finite-trace method. In the former, we would express that a "good" state is one where some transition is enabled---that is, one where the guard portion of some transition evaluates to true.

Question: Working from the graph you drew, how could we fix the problem?

We could add a transition from the deadlock state. Maybe we could allow the first thread to always take priority over the second:

This might manifest in the code as an extra way to escape the while loop. Regardless, adding this transition fixes the deadlock problem, and this property will now pass as well.

Property: Non-Starvation

Even if there are no deadlocks, it's still possible for one thread to be waiting forever. We'd prefer a system where it's impossible for one thread to be kept waiting while the other thread continues to completely hog the critical section.

This property is called non-starvation; more formally, it says that every thread must always (at any point) eventually (at some point) get access to the resource.

Question: How could we check non-starvation in this graph?

Think, then click!

Not by looking for a single "bad state". That won't suffice. There's something interesting going on, here...

Safety Versus Liveness: Intuition

Notice the differences between these 3 properties. In particular, consider what a full counterexample trace to each must look like, if we were inclined to produce one.

For mutual-exclusion and (in this formulation) deadlock-freedom, a counterexample trace could be finite. After some number of transitions, we'd reach a state where a deadlock or failure of mutual-exclusion has occurred. At that point, it's impossible for the system to recover; we've found an issue and the trace has served its purpose.
For a failure of non-starvation, on the other hand, no finite trace can suffice. It's always possible that just ahead, the system will suddenly recover and prevent a thread from being starved. So here, we need some notion of an infinite counterexample trace such that some thread never, ever, gets access.

The difference here is a fundamental distinction in verification. We call properties that have finite counterexamples safety properties, and properties with only infinite counterexamples liveness properties.

Formal Definitions

People often describe safety properties as "something bad never happens" and liveness properties as "something good must happen". I don't like this wording by itself, because it assumes an understanding of "goodness" and "badness". Instead, think about what the solver needs to do if it's seeking a counterexample trace. Then, one really is fundamentally different from the other.

You'll usually find that a liveness property is more computationally complex to check. This doesn't mean that verifying liveness properties is always slower---just that one usually has to bring some additional tricks to bear on the algorithm.

In the context of a finite state system, searching for an infinite counterexample amounts to looking for a reachable cycle in the graph---not a single bad state.

Reinforcing "Join"

You've been using relational operators for a few days now, so I wanted to revisit one of them.

The join documentation says that A . B:

returns the relational join of the two expressions. It combines two relations by seeking out rows with common values in their rightmost and leftmost columns. Concretely, if A is an $n$ -ary relation, and B is $m$ -ary, then A.B equals the $n + m - 2$ -ary relation: ${a_{1}, ..., a_{n - 1}, b_{2}, ..., b_{m} ∣ \exists x ∣ (a_{1}, ..., a_{n - 1}, x) \in A and (x, b_{2}, ..., b_{m}) \in B}$

2023.17: Join, Liveness and Lassos

The Truth About Dot

This part of the notes is meant to reinforce what we'd previously done with relational join in Forge. We'll cover some of this in class, but the rest is here for your reference.

Let's go back to the directed-graph model we used before:

#lang forge
sig Person {
    friends: set Person,
    followers: set Person
}
one sig Nim, Tim extends Person {}
pred wellformed {
    -- friendship is symmetric
    all disj p1, p2: Person | p1 in p2.friends implies p2 in p1.friends
    -- cannot follow or friend yourself
    all p: Person | p not in p.friends and p not in p.followers
}
run {wellformed} for exactly 5 Person

pred reachableIn1To7Hops[to: Person, from: Person, fld: Person->Person] {
    to in from.fld or
    to in from.fld.fld or
    to in from.fld.fld.fld or 
    to in from.fld.fld.fld.fld or
    to in from.fld.fld.fld.fld.fld or
    to in from.fld.fld.fld.fld.fld.fld or
    to in from.fld.fld.fld.fld.fld.fld.fld 
    --  ... and so on, for any finite number of hops
    --  this is what you use the transitive-closure operator (^) 
    --  or the reachable built-in predicate for.
}

We said that chaining field access with . allows us to compute reachability in a certain number of hops. That's how reachableIn1To7Hops works.

However, there's more to . than this.

Beyond Field Access

Let's run this model, and open up the evaluator. I'll show the first instance Forge found using the table view:

We saw that Tim.friends produces the set of Tim's friends, and that Tim.friends.friends produces the set of Tim's friends' friends. But let's try something else. Enter this into the evaluator:

friends.friends

This looks like a nonsense expression: there's no object to reference the friends field of. But it means something in Forge:

What do you notice about this result? Recall that this is just a parenthetical way to show a set of tuples: it's got $(P erso n 0, P erso n 0)$ in it, and so on.

Think, then click!

This seems to be the binary relation (set of 2-element tuples) that describes the friend-of-friend relationship. Because we said that friendship is symmetric, everyone who has friends is a friend-of-a-friend of themselves. And so on.

The . operator in Forge isn't exactly field access. It behaves that way in Froglet, but now that we have sets in the language, it's more powerful. It lets us combine relations in a path-like way.

Relational Join

Here's the precise definition of the relational join operator (.):

If R and S are relations (with $n$ and $m$ columns, respectively), then R.S is defined to be the set of $(n + m - 2)$ -column tuples: ${(r_{1}, ..., r_{n - 1}, s_{2}, ..., s_{m}) ∣ (r_{1}, ..., r_{n}) \in R, (s_{1}, ..., s_{m}) \in S, and r_{n} = s_{1}}$

That is, whenever the inner columns of the two relations line up on some value, their join contains some tuple(s) that have the inner columns eliminated.

In a path-finding context, this is why Tim.friends.friends.friends.friends has one column, and all the intermediate steps have been removed: Tim has one column, and friends has 2 columns. Tim.friends is the $(1 + 2 - 2)$ -column relation of Tim's friends. And so on: every time we join on another friends, 2 columns are removed.

Let's try this out in the evaluator:

Does this mean that we can write something like followers.Tim? Yes; it denotes the set of everyone who has Tim as a follower:

Note that this is very different from Tim.followers, which is the set of everyone who follows Tim:

Testing Our Definition

We can use Forge to validate the above definition, for relations with fixed arity. So if we want to check the definition for pairs of binary relations, up to a bound of 10, we'd run:

test expect {
    joinDefinitionForBinary: {
        friends.followers = 
        {p1, p2: Person | some x: Person | p1->x in friends and 
                                           x->p2 in followers}
    } for 10 Person is theorem
}

Notice that we don't include wellformed here: if we did, we wouldn't be checking the definition for all possible graphs.

Join Errors

Forge will give you an error message if you try to use join in a way that produces a set with no columns:

or if it detects a type mismatch that would mean the join is necessarily empty:

When you see a parenthesized formula in an error like this, you can read it by interpreting operator names in prefix form. E.g., this one means Int.friends.

Spring 2024

Many of these errors have now been updated to show the proper, non-parenthetical form.

What's Join Good For?

Here's an example. Suppose you're modeling something like Dijkstra's algorithm. You'd need a weighted directed graph, which might be something like this:

sig Node {
    edges: Node -> Int
}

But then edges has three columns, and you won't be able to use either reachable or ^ on it directly. Instead, you can eliminate the rightmost column with join: edges.Int, and then use that expression as if it were a set field.

Counterexamples To Liveness

Last time we noticed that "every thread, whenever it becomes interested in the critical section, will eventually get access" is a different kind of property---one that requires an infinite counterexample to disprove about the system. We called this sort of property a liveness property.

In a finite-state system, checking a liveness property amounts to looking for a bad cycle: some trace, starting from an initial state, that loops back on itself. Since these traces don't always loop back to the first state, we'll often call these lasso traces, named after a loop of rope.

Here's an example. Consider the (reachable states only) transition system we drew last time:

Can you find a lasso trace that violates our liveness property?

Think, then click!

Here's one of them:

$(D i s, 0, D i s, 0)$ ; then
$(D i s, 0, W, 1)$ ; then
$(D i s, 0, C, 1)$ ; then back to
$(D i s, 0, D i s, 0)$ .

This lasso trace does just happen to loop back to its first state. It shows the second process executing forever, and the first process remains uninterested forever.

Is this good or bad? It depends on how we write the property. And certainly, we could find worse traces!

Checking Liveness In Forge (Attempt 1)

How could we encode this sort of check in Forge? We wouldn't be able to use the inductive method---at least, not in a naïve way. So let's use the finite-trace approach we used to generate games of Tic-Tac-Toe. However, we can't just say that some state in a trace violates the property: we need to encode the search for a bad cycle, too.

Setting Up

We'll add the same finite-trace infrastructure as before. This time we're able to use full Forge, so as a demo we'll use the transpose (~) operator to say that the initial state has no predecessors.

one sig Trace {
    initialState: one State,
    nextState: pfunc State -> State
}

pred trace {
    no Trace.initialState.~(Trace.nextState)
    init[Trace.initialState]
    all s: State | some Trace.nextState[s] implies {
        delta[s, Trace.nextState[s]]
    }
}

Enforcing Lasso Traces

It's helpful to have a helper predicate that enforces the trace being found is a lasso.

pred lasso {
    trace
    all s: State | some Trace.nextState[s]
}

Let's test this predicate to make sure it's satisfiable. And, because we're careful, let's make sure it's not satisfiable if we don't give the trace enough states to loop back on itself:

test expect {
  lassoVacuity: { lasso } is sat
  lassoVacuityNotEnough: { lasso } for 2 State is unsat
}

Beware...

There is actually a hidden overconstraint bug in our lasso predicate. It's not so extreme as to make the predicate unsatisfiable---so the test above passes! What's the problem?

Think, then click!

We said that the initial state has no predecessor. This will prevent the lasso from looping back to the start---it will always have some states before the cycle begins. If the bug we're looking for always manifests as a loop back to the starting state, we would be lulled into a false sense of success by Forge.

Watch out for this kind of bug!

This is why thinking through vacuity testing is important. It's also a reason why, maybe, we'd like to avoid having to write all this temporal boilerplate (and potentially introduce bugs).

Identifying A Bad Cycle

If we know that the trace is a lasso, we can write a predicate that identifies some process being starved. This isn't easy, though. To see why, look at this initial attempt:

run {
  lasso
  all s: State | s.loc[ProcessA] != InCS
}

This is unsatisfiable, which is what we'd expect. So what's wrong with it?

We might first wonder, as we usually should, whether the test allocates enough states to reasonably find a counterexample. We've got 8 reachable states, so maybe we'd need 8 (or 9?) states in the test. That's true!

But there's something else wrong here, and it's more subtle. We'll need to address both to make progress.

Think, then click!

The badLasso predicate wouldn't hold true if the system allowed ProcessA to enter the critical section once (and only once). We need to say that the loop of the lasso doesn't allow a process in, no matter what happens before the cycle starts.

That sounds like a lot of work. More importantly, it sounds really easy to get wrong. Maybe there's a better way.

2023.18: Temporal Operators in Forge

CSCI 1710

I want to remind everyone that if you got a check-minus, it doesn't mean "not passing"; it doesn't even mean "can't get an A". It means "not A quality work on this assignment". You do not need any check-plusses to get an A. Check-plusses are rare.

Livecode is here.

Temporal Operators: Survey

I wonder if we could add notions of "always" and "eventually" and so on to Forge. That would help us avoid some of the errors that occur when we try to represent lasso traces manually.

CSCI 1710

Your class exercise today is to try out this survey.

We're asking whether a specific trace satisfies a constraint that uses the new operators.

Temporal Forge

Here's a quick run-down:

lasso traces are kind of a bother to handle manually;
properties like the ones we checked last time more naturally expressed with (carefully defined) operators like always and eventually; and
supporting more industrial model-checking languages in Forge will give everyone a better grounding in using those tools in the future (outside of class, but also on the term project). These tools very often use such operators.

Temporal Forge takes away the need for you to explicitly model traces. It forces the engine to only ever find lasso traces, and gives you some convenient syntax for working under that assumption. A field can be declared var, meaning it may change over time. And so on.

I'll repeat the most important clause above: Forge's temporal mode forces the engine to only ever find lasso traces. It's very convenient if that's what you want, but don't use it if you don't!

Here's an example of what I mean. I'll give you a small example of a temporal-mode model in Forge. Suppose we're modeling a system with a single integer counter...

#lang forge/temporal

-- enable traces of up to length 10
option max_tracelength 10

one sig Counter {
  var counter: one Int
}

run {
-- Temporal-mode formulas "start" at the first state
-- The counter starts out at 0
  Counter.counter = 0
  -- The counter is incremented every transition:
  always Counter.counter' = add[Counter.counter, 1]
} for 3 Int

This is satisfiable, but only by exploiting integer overflow. If we weren't able to use overflow, this would be unsatisfiable: there wouldn't be enough integers available to form a lasso. And Temporal Forge only looks for lassos.

Converting to Temporal Forge

Let's convert the model from last time into temporal mode. We'll add the necessary options first. Note that options in Forge are usually positional, meaning that it is usually a good idea to have options at the beginning unless you want to vary parameters per run.

#lang forge/temporal

option max_tracelength 10

Be sure to get the underscores right! Misspellings like maxtracelength aren't an option for Forge.

Mixing Temporal Forge with the kind of state-aware model we had before can be tough. In temporal mode, we don't have the ability to talk about specific pre- and post-states, which means we have to change the types of our predicates. For init, we have:

-- No argument! Temporal mode is implicitly aware of time
pred init {
    all p: Process | World.loc[p] = Disinterested
    no World.flags 
}

The loss of a State sig is perhaps disorienting. How does Forge evaluate init without knowing which state we're looking at? In temporal mode, every constraint is evaluated not just about an instance, but also in the context of some moment in time. You don't need to explicitly mention the moment. So no World.flags is true if, at the current time, there's no flags raised.

Similarly, we'll need to change our transition predicates:

-- Only one argument; same reason as above
pred raise[p: Process] {
    // pre.loc[p] = Disinterested
    // post.loc[p] = Waiting
    // post.flags = pre.flags + p
    // all p2: Process - p | post.loc[p2] = pre.loc[p2]
    World.loc[p] = Disinterested
    World.loc'[p] = Waiting
    World.flags' = World.flags + p
    all p2: Process - p | World.loc'[p2] = World.loc[p2]
}

I've left the old version commented out, so you can contrast the two. Again, the predicate is true subject to an implicit moment in time. The priming (') operator means "this expression in the next state"; so if raise holds at some point in time, it means there is a specific relationship between the current and next moments.

We'll convert the other predicates similarly, and then run the model:

run {
    -- start in an initial state
    init
    -- in every state of the lasso, the next state is defined by the transition predicate
    always delta
}

There are some threats to success here (like deadlocks!) but we'll return to those on Friday. Likewise, there are some issues with the model remaining that we haven't yet dealt with.

Running A Temporal Model

When we run, we get this:

New Buttons!

In temporal mode, Sterling has 2 "next" buttons, rather than just one. The "Next Trace" button will hold all non-var relations constant, but ask for a new trace that varies the var relations. The "Next Config" button forces the non-var relations to differ. These are useful, since otherwise Forge is free to change the value of any relation, even if it's not what you want changed.

Trace Minimap

In temporal mode, Sterling shows a "mini-map" of the trace in the "Time" tab. You can see the number of states in the trace, as well as where the lasso loops back.

It will always be a lasso, because temporal mode never finds anything but lassos.

You can use the navigation arrows, or click on specific states to move the visualization to that state:

Theming works as normal. For the moment, custom visualizer scripts need to visualize a single state at at time.

2023.19: More Temporal-Mode Forge

Looking ahead, we'll spend a couple more days on Temporal Forge and Linear Temporal Logic (LTL). After that, we'll start talking technically about how Forge works. Many of you have been curious, and it will set you up well for the upcoming homework where you'll be building your own solver.

Livecode for today is here.

Linear Temporal Logic (LTL)

Formally, the temporal operators Forge provides correspond to a language called Linear Temporal Logic (or LTL). It's temporal because it has temporal operators like always and eventually, and it's linear because it's interpreted over (infinite) linear traces.

LTL is commonly used in industry. And even where it isn't used directly, many other temporal specification languages are either related to LTL (e.g., branching-time logics like CTL) or inspired by it (e.g., Lamport's $T L A^{+}$ and other more recent languages). There are a lot of industrial model-checking tools out there, using a lot of different languages, but learning LTL will help you build intuition for nearly all of them.

(And on the research side of things, there's been work right here at Brown to use LTL for specifying robot behaviors! For example, this paper.)

How To Read A Temporal Formula

Recall that:

time is implicit in temporal mode, and
temporal mode only ever finds lasso traces.

When you write a constraint in temporal mode, it's true with respect to an instance (which is always a lasso trace) and a time index into that trace. Thus the init predicate may be true, or not true, depending on which state you're looking at.

Evaluation always starts at the first state. This corresponds to the top-level run command or test. I didn't say "the initial state", because if we've got a predicate that encodes "initial state", it won't be enforced unless we've told Forge to do so. This is why, usually, you'll start by putting:

init

(or whatever your initial-state predicate is) in your top-level run.

As soon as temporal operators become involved, however, the "evaluate in the first state" rule starts to fail.

Moving Forward In Time

You can refer to the next state (relative to the current one, whatever it is) by using the next_state operator. If I wanted to say that the second and third states would also be acceptable as initial states, I'd write:

init
next_state init
next_state next_state init

in the top-level run block. It's rare you'd do something like this in practice, but it's a good first demonstration of the operator.

Why `next_state`?

The keyword is, admittedly, a little verbose. But it was the best of the various candidates at hand:

In LTL, the operator is usually called X, which is not very descriptive.
In Forge's parent language, Alloy, the operator is called after, but this can lead to some misconceptions since A after B might be misinterpreted as a binary operator, and Forge and Alloy both have implicit and via newlines.
I've heard afterward suggested, but that risks confusion with always or eventually.

Quantifying Over Time

What does it mean for something to always be true, or to eventually hold? These terms effectively quantify over time: if something is always true, it's true at all time indexes (starting now). If something is eventually true, it's true at some time index (possibly now).

So if we wanted to say that every state in the trace transitions to its successor in accordance with our move predicate, we'd say:

always move

Nesting Operators

Just like you can nest all and some, you can nest always and eventually. We'll need to do this to express properties like non-starvation. In fact, let's think about how to express non-starvation now!

We had informally written non-starvation in our mutex model as something like "once a process becomes interested, it eventually gets access". How would you write this using temporal operators, assuming that interested and access were predicates describing the process becoming interested and getting access respectively?

Think, then click!

We might start with: interested => eventually access. That would be a reasonable start: if the process is interested, it eventually gets access. The problem is that the interest is measured now---that is, at whatever time index Forge is currently looking.

Clearly we need to add some sort of temporal operator that prevents the above issue. Here's a possible candidate: (eventually interested) => (eventually access).

Think, then click!

The problem here is that there's no connection between the time at which the left-hand side holds, and the time at which the right-hand side holds. To force that relationship (access _after_ interest) we need to nest the two temporal quantifiers.

How about eventually (interested => (eventually access))?

Think, then click!

This constraint isn't strong enough. Imagine a trace where the process never gets access, but is interested only (say) half the time. Then any of those disinterested states will satisfy the subformula `interested => (eventually access)`.

Why? Think about how an implication is satisfied. It can be satisfied if the right-hand side is true, but also if the left-hand side is false---in the case where no obligation needs to be incurred! So the implication above evaluates to true in any state where the process isn't interested. And using eventually means any single such state works...

It seems like we need a different temporal operator...

Think, then click!

We'll say: always {interested => eventually access}. Now, no matter what time it is, if the process is interested, it has to eventually get access.

This sort of always-eventually pattern is good for (contingent) "infinitely often" properties, which is exactly what non-starvation is.

Let's Try It Out!

I'm going to ask you to play the role of Forge. I've listed some temporal constraints below, and would like you to come up with some instances (lasso traces) that satisfy them. Don't use Forge unless you've already tried, and are stumped.

For all examples, you may come up with your own shape of the world. That is, you might pick a University (where a state is a semester, with courses offered and taken) or your experience waiting for an elevator in the CIT, or anything else from your life! I'll use X, Y and Z to denote arbitrary facts that might be true, or not true---your job is to plug in specifics, and then find a satisfying trace!

I'll use a lot of parentheses, to avoid confusion about operator precedence...

`eventually (always (X or Y))`

Think, then click!

Suppose `X` stands for weekday, and `Y` for weekend. Then the normal progression of time satisfies the constraint: after some point (perhaps right now) it will always be either a weekday or a weekend.

I am probably abstracting out some important details here, like the heat-death of the universe. But that's not really the point. The point is that alternation between X and Y is allowed---it's always either one or the other, or possibly even both.

`always (eventually (X and (next_state Y)))`

Think, then click!

Suppose `X` stands for "Saturday", and `Y` for "Sunday". Then it's always true that, at _any point_ in time, there is a Saturday-Sunday pair of days in the future.

`always ((eventually X) or (always Y)))`

Think, then click!

Suppose `X` stands for "final exam or demo", and `Y` for "graduated". Then, at _any point_ in time, either there's a final in your future _or_ you've graduated (and stay graduated forever).

Note that this doesn't mean you can't take a final exam after graduating if you want to. Both sides of the or can be true. It just means that, at any point, if you haven't graduated permanently, you must eventually take an exam.

Fixing Our Lock Model

A deadlock state is one where no outgoing transitions are possible. How can we write a test in temporal mode that tries to find a reachable deadlock state? There are two challenges:

How do we phrase the constraint, in terms of the transition predicates we have to work with?
How do we even allow Forge to find a deadlock, given that temporal mode only ever finds lasso traces? (A deadlock in a lasso trace is impossible, since a deadlock prevents progress!)

Let's solve the second challenge first, since it's more foundational.

Finding Deadlocks Via Lassos

We could prevent this issue by allowing a doNothing transition from every state. Then from Forge's perspective there's no "deadlock", and a lasso trace can be found.

But this fix causes new problems. If we allow a doNothing transition to happen anywhere, our liveness property is definitely destroyed, even if we were modeling a smarter algorithm. So we need to reduce the power of doNothing somehow.

Put another way: we started with an overconstraint bug: if only lassos can be found, then we can't find a trace exhibiting deadlock. Adding doNothing fixes the overconstraint but adds a new underconstraint, because we'll get a lot of garbage traces where the system can just pause arbitrarily (while the trace continues).

We saw this phenomenon earlier when we were modeling tic-tac-toe, and wanted to work around the fact that the is linear annotation forces exact bounds. We can use the same ideas in the fix here.

Finding Deadlock

Let's look at one of our transitions:

pred raise[p: Process] {
    World.loc[p] = Disinterested
    World.loc'[p] = Waiting
    World.flags' = World.flags + p
    all p2: Process - p | World.loc'[p2] = World.loc[p2]
}

Notice it's split (implictly) into a "guard" and an "action". If all the constraints in the guard are true, the transition can occur. Formally, we say that if all the guard constraints hold, then the transition is enabled. When should doNothing be enabled? When no other transition is.

pred doNothing {
    -- GUARD (nothing else can happen)
    not (some p: Process | enabledRaise[p]) 
    not (some p: Process | enabledEnter[p]) 
    not (some p: Process | enabledLeave[p]) 
    -- ACTION
    flags' = flags
    loc' = loc
}

We won't create a separate enabledDoNothing predicate. But we will add doNothing to the set of possible moves:

pred trans {
    some p: Process | 
        raise[p] or
        enter[p] or 
        leave[p] or 
        doNothing 
}

And we'd also better create those 3 enabled predicates, too.

Finally, we can write a check looking for deadlocks:

test expect {
    noDeadlocks_counterexample: {
        init
        always delta
        not always {
            some p: Process |
                enabledRaise[p] or
                enabledEnter[p] or
                enabledLeave[p] 
        }
    } is sat
}

which fails. But why?

The counterexample (at least, the one I got) is 3 states long. And in the final state, both processes are Waiting. Success! Or, at least, success in finding the deadlock.

But how should we fix the algorithm? And how can we avoid confusion like this in the future?

Obligations and The Past

Temporal Forge Reminders

Remember the definition of "configuration": the value of all relations that aren't marked var. Thus, if you click the Sterling button that asks for a new configuration, the solver will always find a new trace that varies on one or more of those relations. This is useful when you want to see different temporal behavior, but not vary the constants.
Do not try to use example in temporal mode. For reasons we'll get to soon (when we talk about how Forge works) example and inst constrain all states in temporal mode, and so an example will prevent anything it binds from ever changing in the trace.

The in-class exercise is here. The livecode is the same model we've been working on: the locking algorithm, and the (new) traffic lights example.

Reminder: Priming for "next state" expressions

You can talk about the value of an expression in the next state by appending ' to the expression. So writing flags' means the value of the flags relation in the state after the current one.

Back to LTL: Obligation

Suppose we've written a model where stopped and green_light are predicates that express our car is stopped, and the light is green. Now, maybe we want to write a constraint like, at the current moment in time, it's true that:

the light must eventually turn green; and
the stopped predicate must hold true until the light turns green.

We can write the first easily enough: eventually green. But what about the second? We might initially think about writing something like: always {not green implies stopped}. But this doesn't quite express what we want. (Why?)

Think, then click!

The formula always {not green implies stopped} says that at any single moment in time, if the light isn't green, our car is stopped. This isn't the same as "the stopped predicate holds until the light turns green", though; for one thing, the latter applies until green happens, and after that there is no obligation remaining on stopped.

In LTL, the until operator can be used to express a stronger sort of eventually. If I write stopped until green_light, it encodes the meaning above. This operator is a great way to phrase obligations that might hold only until some releasing condition occurs.

Strong vs. Weak Until

Some logics include a "weak" until operator that doesn't actually enforce that the right-hand side ever holds, and so the left-hand side can just be true forever. But, for consistency with industrial languages, Forge's until is "strong", so it requires the right-hand side hold eventually.

The car doesn't have to move!

The until operator doesn't prevent its left side from being true after its right side is. E.g., stopped until green_light doesn't mean that the car has to move immediately (or indeed, ever) once the light is green. It just means that the light eventually turns green, and the car can't move until then.

The Past (Rarely Used, but Sometimes Useful)

Forge also includes temporal operators corresponding to the past. This isn't standard in some LTL tools, but we include it for convenience. It turns out that past-time operators don't increase the expressive power of the language, but they do make it much easier and consise to write some constraints.

Elevator Critique

These operators may be useful to you on the second Temporal Forge homework. You may also see them in lab.

Here are some examples:

`prev_state init`

This means that the previous state satisfied the initial-state predicate. But beware: traces are infinite in the forward direction, but not infinite in the backward direction. For any subformula myPredicate, prev_state myPredicate is false if the current state is the first state of the trace.

There are also analogues to always and eventually in the past: historically and once. For more information, see the documentation.

Interlude on Testing and Properties

As we start modeling more complex systems, models become more complex. The more complex the model is, the more important it is to test the model carefully. Just like in software testing, however, you can never be 100% sure that you have tested everything. Instead, you proceed using your experience and following some methodology.

Let's get some practice with this. Before we start modifying our locking algorithm model, we should think carefully---both about testing, but also about how the model reflects the real world.

Principle 1: What's the Domain? What's the System?

When we're writing a model, it's useful to know when we're talking about the system, and when we're talking about the domain that the system is operating on.

The domain has a set of behaviors it can perform on its own. In this example, the threads represent the domain: programs running concurrently.
The system affects that set of behaviors in a specific way. In this example, the locking algorithm is the system. Usually the system functions by putting limits and structure on otherwise unfettered behavior. (E.g., without a locking algorithm in place, in reality threads would still run and violate mutual exclusion.)
Because we usually have goals about how, exactly, the system constrains the domain, we state requirements about how the domain behaves in the system's presence. Think of these as the top-level goals that we might be modeling in order to prove (or disprove) about the system or otherwise explore.
Because the domain cannot "see" inside the system, we'll try to avoid stating requirements in terms of internal system variables. However, models are imperfect! We will also have some validation tests that are separate from the requirements. These may or may not involve internal system state.

We'll develop these ideas more over the next few weeks. For now, keep in mind that when you add something to a model, it's good to have a sense of where it comes from. Does it represent an internal system state, which should probably not be involved in a requirement, but perhaps should be checked in model validation?

Have we been disciplined about this so far?

No we have not! And we may be about to encounter some problems because of that.

Principle 2: What Can Be Observed?

Ask what behaviors might be important in the domain, but not necessarily always observed. These are sometimes referred to as optional predicates, because they may or may not hold. In real life, here are some optional predicates:

we have class today;
it's raining today;
homework is due today; etc.

Exercise: What are some optional predicates about the domain in our locking algorithm model?

Think, then click!

We might see, but won't necessarily always see:

combinations of different transitions;
threads that are both simultaneously interested;
threads that are uninterested;
etc.

As the model includes more domain complexity, the set of optional predicates grows.

Let's write some actual tests to confirm that these behaviors can happen (or not), and check whether that matches our modeling intuition.

For Next Time

We'll consider questions of atomicity (how granular should transitions be?), and model a different variation of this lock.

Peterson Lock and Fairness

Let's fix the broken locking algorithm we modeled before. We'll also talk about improving testing, modeling pitfalls, etc. as we go along. The livecode will be in peterson.frg.

Fixing Deadlock

Our little algorithm is 90% of the way to the the Peterson lock. The Peterson lock just adds one extra bit of state, and one transition to set that bit. In short, if our current algorithm is analogous to raising hands for access, the other bit added is like a "no, you first" when two people are trying to get through the door at the same time; that's exactly the sort of situation our both-flags-raised deadlock represented.

We'll add a polite: lone Process field to each Process, to represent which process (if any) has just said "no, you go first". The algorithm now needs a step to set this value. It goes something like this (in pseudocode) after the process becomes interested:

// location: uninterested 
this.flag = true
// location: halfway
polite = me
// location: waiting 
while(other.flag == true || polite == me) {} // hold until their flag is lowered _or_ the other is being polite
// location: in CS 
run_critical_section_code(); // don't care details
this.flag = false

Because we're modeling (as best as we can) individual operations executing, we'll need to add a new location to the state, which I'll call Halfway. We'll also need a new transition (and to change existing transitions):

pred enabledNoYou[p: Process] {
    World.loc[p] = Halfway
}
pred noYou[p: Process] {
    enabledNoYou[p]
    World.loc'[p] = Waiting
    World.flags' = World.flags
    World.polite' = p
    all p2: Process - p | World.loc'[p2] = World.loc[p2]
}

one small edit in raise, to set

World.loc'[p] = Halfway

instead of

World.loc'[p] = Waiting

and a modification to the enter transition so that it's enabled if either nobody else has their flag raised or the current process isn't the one being polite anymore:

pred enabledEnter[p: Process] {
    World.loc[p] = Waiting 
    -- no other processes have their flag raised *OR* this process isn't the polite one
    (World.flags in p or World.polite != p)
}

Then we add the new transition to the overall transition predicate, to doNothing, to the deadlock check test---anywhere we previously enumerated possible transitions.

We also need to expand the frame conditions of all other transitions to keep polite constant.

Watch out!

Beware of forgetting (or accidentally including) primes. This can lead to unsatisfiable results, since the constraints won't do what you expect between states.

Let's Check Non-Starvation

noStarvation: {
        lasso implies {
        all p: Process | {
            always {
                -- beware saying "p in World.flags"; using loc is safer
                --   (see why after fix disinterest issue)                
                p in World.flags =>
                eventually World.loc[p] = InCS
            }
        }}
    } is theorem

This passes. Yay!

Should we be suspicious?

That was really easy. Everything seems to be working perfectly. Maybe we can stop early and go get ice cream.

But we should probably do some validation to make sure we haven't missed something. Here's a question: is our domain model realistic enough to trust these results?

Abstraction Choices We Made

We made a choice to model processes as always eventually interested in accessing the critical section. There's no option to become uninterested, or to pass on a given cycle.

Suppose we allowed processes to become uninterested and go to sleep. How could this affect the correctness of our algorithm, or of the non-starvation property?

Think, then click!

The property might break because a process's flag is still raised as it is leaving the critical section, so the implication is too strong. It might be safer to say World.loc[p] = Waiting => ....

But even the correct property will fail in this case: there's nothing that says one process can't completely dominate the overall system, locking its counterpart out. Suppose that ProcessA is Waiting and then ProcessB stops being interested. If we modeled disinterest as a while loop, perhaps using doNothing or a custom stillDisinterested transition, then ProcessA could follow that loop forever, leaving ProcessB enabled, but frozen.

Aside

In your next homework, you'll be critiquing a set of properties and algorithms for managing an elevator. Channel your annoyance at the CIT elevators, here! Of course, none of our models encompass the complexity of the CIT elevators...

Fairness

In a real system, it's not really up to the process itself whether it gets to run forever; it's also up to the operating system's scheduler or the system's hardware. Thus, "fairness" in this context isn't so much a property to guarantee as a precondition to require. Without the scheduler being at least somewhat fair, the algorithm can guarantee very little.

Think of a precondition as an environmental assumption that we rely on when checking the algorithm. This is a common sort of thing in verification and, for that matter, in computer science. (If you take a cryptography course, you might encounter the phrase "...is correct, subject to standard cryptographic assumptions".)

Let's add the precondition, which we'll call "fairness". There are many ways to phrasing fairness, and since we're making it an assumption about the world outside our algorithm, we'd really like to pick something that suffices for our needs, but isn't any stronger than that.

pred weakFairness {
    all p: Process | {
        (eventually always 
                (enabledRaise[p] or
                enabledEnter[p] or
                enabledLeave[p] or
                enabledNoYou[p])) 
        => 
        (always eventually (enter[p] or raise[p] or leave[p] or noYou[p]))        
    }
}

This may initially look a little strange. There are a few ways to express fairness (weak, strong, ...) but weak fairness is sufficient for our needs! It says that if a process is ready to go with some transition forever (possibly a different transition per state), it must be allowed to proceed (with some transition, possibly different each time) infinitely often.

Hillel Wayne has a great blog post on the differences between these. Unfortunately it's in a different modeling language, but the ideas come across well.

Once we add weakFairness as an assumption, the properties pass.

Peterson Lock and Fairness

Let's fix the broken locking algorithm we modeled before. We'll also talk about improving testing, modeling pitfalls, etc. as we go along. The livecode will be in peterson.frg.

Fixing Deadlock

// location: uninterested 
this.flag = true
// location: halfway
polite = me
// location: waiting 
while(other.flag == true || polite == me) {} // hold until their flag is lowered _or_ the other is being polite
// location: in CS 
run_critical_section_code(); // don't care details
this.flag = false

pred enabledNoYou[p: Process] {
    World.loc[p] = Halfway
}
pred noYou[p: Process] {
    enabledNoYou[p]
    World.loc'[p] = Waiting
    World.flags' = World.flags
    World.polite' = p
    all p2: Process - p | World.loc'[p2] = World.loc[p2]
}

one small edit in raise, to set

World.loc'[p] = Halfway

instead of

World.loc'[p] = Waiting

and a modification to the enter transition so that it's enabled if either nobody else has their flag raised or the current process isn't the one being polite anymore:

pred enabledEnter[p: Process] {
    World.loc[p] = Waiting 
    -- no other processes have their flag raised *OR* this process isn't the polite one
    (World.flags in p or World.polite != p)
}

Then we add the new transition to the overall transition predicate, to doNothing, to the deadlock check test---anywhere we previously enumerated possible transitions.

We also need to expand the frame conditions of all other transitions to keep polite constant.

Watch out!

Beware of forgetting (or accidentally including) primes. This can lead to unsatisfiable results, since the constraints won't do what you expect between states.

Let's Check Non-Starvation

noStarvation: {
        lasso implies {
        all p: Process | {
            always {
                -- beware saying "p in World.flags"; using loc is safer
                --   (see why after fix disinterest issue)                
                p in World.flags =>
                eventually World.loc[p] = InCS
            }
        }}
    } is theorem

This passes. Yay!

Should we be suspicious?

That was really easy. Everything seems to be working perfectly. Maybe we can stop early and go get ice cream.

But we should probably do some validation to make sure we haven't missed something. Here's a question: is our domain model realistic enough to trust these results?

Abstraction Choices We Made

We made a choice to model processes as always eventually interested in accessing the critical section. There's no option to become uninterested, or to pass on a given cycle.

Suppose we allowed processes to become uninterested and go to sleep. How could this affect the correctness of our algorithm, or of the non-starvation property?

Think, then click!

The property might break because a process's flag is still raised as it is leaving the critical section, so the implication is too strong. It might be safer to say World.loc[p] = Waiting => ....

Aside

Fairness

pred weakFairness {
    all p: Process | {
        (eventually always 
                (enabledRaise[p] or
                enabledEnter[p] or
                enabledLeave[p] or
                enabledNoYou[p])) 
        => 
        (always eventually (enter[p] or raise[p] or leave[p] or noYou[p]))        
    }
}

Hillel Wayne has a great blog post on the differences between these. Unfortunately it's in a different modeling language, but the ideas come across well.

Once we add weakFairness as an assumption, the properties pass.

Bounds and Booleans: How Forge Works

"The world is everything that is the case. The world is the totality of facts, not of things. The world is determined by the facts, and by these being all the facts."

Ludwig Wittgenstein (Tractatus Logico-philosophicus)

How Does Forge Work?

Every run (or test, or example) command defines a search problem: find some instance that satisfies the given constraints and is within the given bounds.

When you click "Run", Forge compiles this search problem into a boolean satisfiability problem, which it can give to a (hopefully well-engineered!) 3rd party boolean solver. You'll be writing such a solver for homework after Spring break!

There are complications, though. The search problem is in terms of atomic things: objects of particular types, which can go into sets and relations and so on. In contrast, a boolean problem is in terms of boolean variables: atomic truths that can be combined with and, or, etc. Somehow, we need to bridge that gap.

What Boolean Solvers Understand

As an example of where Forge needs to end up, here's an example of a real problem to pass to a boolean solver. It's in a standard format called DIMACS, and it describes finding a solution to the 4-queens problem.

There are many different ways to express this problem to the solver, but this is one of them. What do you think it's saying, exactly?

c DIMACS for 4 queens
c 
p cnf 16 84
1 2 3 4 0
-1 -2 0
-1 -3 0
-1 -4 0
-2 -3 0
-2 -4 0
-3 -4 0
5 6 7 8 0
-5 -6 0
-5 -7 0
-5 -8 0
-6 -7 0
-6 -8 0
-7 -8 0
9 10 11 12 0
-9 -10 0
-9 -11 0
-9 -12 0
-10 -11 0
-10 -12 0
-11 -12 0
13 14 15 16 0
-13 -14 0
-13 -15 0
-13 -16 0
-14 -15 0
-14 -16 0
-15 -16 0
1 5 9 13 0
-1 -5 0
-1 -9 0
-1 -13 0
-5 -9 0
-5 -13 0
-9 -13 0
2 6 10 14 0
-2 -6 0
-2 -10 0
-2 -14 0
-6 -10 0
-6 -14 0
-10 -14 0
3 7 11 15 0
-3 -7 0
-3 -11 0
-3 -15 0
-7 -11 0
-7 -15 0
-11 -15 0
4 8 12 16 0
-4 -8 0
-4 -12 0
-4 -16 0
-8 -12 0
-8 -16 0
-12 -16 0
-1 -6 0
-1 -11 0
-1 -16 0
-2 -7 0
-2 -12 0
-2 -5 0
-3 -8 0
-3 -6 0
-3 -9 0
-4 -7 0
-4 -10 0
-4 -13 0
-5 -10 0
-5 -15 0
-6 -11 0
-6 -16 0
-6 -9 0
-7 -12 0
-7 -10 0
-7 -13 0
-8 -11 0
-8 -14 0
-9 -14 0
-10 -15 0
-10 -13 0
-11 -16 0
-11 -14 0
-12 -15 0

You won't need to write a parser for DIMACS for your homework; we'll give that to you. But the format tells us a lot about what a solver understands. Here are a few facts about DIMACS:

Boolean variables in DIMACS are represented by integers greater than zero.
If p is a variable, then not p is represented as the integer -p.
Lines starting with a c are comments.
The line p cnf 16 84 says there are 16 variables and 84 clauses. A clause is a set of variables and their negations combined with or. E.g., 4 8 12 16 0 means 4 or 8 or 12 or 16 (0 is a line-terminator).
To satisfy the input, every clause must be satisfied.

A set of constraints expressed as a set of clauses, each of which must hold true, is said to be in Conjunctive Normal Form (CNF). Boolean solvers often expect input in CNF, for algorithmic reasons we'll see after break.

Now that you know how to read the input format, you might be able to see how the boolean constraints work. Any ideas?

Think, then click!

There's one variable for every square on the $4 \times 4$ squares. 1 2 3 4 says that there must be a queen somewhere on the first row. -1 -2 says that if there is a queen at $1, 1$ there cannot also be a queen at $1, 2$ . And so on.

"The world is that which is the case"

Consider this Forge model and corresponding run command:

abstract sig Person {
  followers: set Person
}
one sig Alice, Bob, Charlie extends Person {}
run {some followers} for exactly 3 Person

How many potential instances are there? Note that there can only ever be exactly 3 people, since Person is abstract.

Think, then click!

There are always exactly 3 people, and the only relation that can vary is `followers`, which has 2 columns. That means

3^{2} = 9

potential pairs of people, and the field contains a set of those. So there are

2^{9} = 512

potential instances.

Notice how we reached that number. There are 9 potential pairs of people. 9 potential follower relationships. 9 essential things that may, or may not, be the case in the world. Nothing else.

If you run Forge on this model, you'll see statistics like these:

#vars: (size-variables 10); #primary: (size-primary 9); #clauses: (size-clauses 2)
Transl (ms): (time-translation 122); Solving (ms): (time-solving 1)

The timing may vary, but the other stats will be the same. The thing to focus on is: 9 primary variables. Primary variables correspond to these atomic truths, which in this case is just who follows who in our fixed 3-person world.

Let's try increasing the size of the world:

run {some followers} for 4 Person

Now we have a 4th person---or rather, we might have a 4th person. When we run, Forge shows:

#vars: (size-variables 27); #primary: (size-primary 17); #clauses: (size-clauses 18)

We've gone from 9 to 17 primary variables. Why?

Think, then click!

There is another potential Person in the world; the world may be either size 3 or 4. Whether or not this fourth person exists is 1 new Boolean variable. And since there are 4 potential people in the world, there are now $4^{2} = 16$ potential follower relationships.

This equals 17 variables.

Intermediate Representation: Lower Bounds, Upper Bounds

Forge's inst blocks allow more fine-grained control over what can be true in an instance. To motivate this, let's increase the verbosity and look at what Forge produces as an intermediate problem description for the above model.

option verbose 5

Let's focus on a few lines:

(univ 20)

This tells the compiler that there are 20 potential objects in the world. (Why 20? Because the default bitwidth is 4. 16 integers plus 4 potential people.) These objects get assigned integer identifiers by the compiler. This is an unfortunate overlap in the engine's language: objects (input) get assigned integers, as do boolean variables (output). But they are not the same thing!

Next, the compiler gets provided a lower and upper bound for every relation in the model.

The lower bound is a set of tuples that must always be in the relation.
The upper bound is a set of tuples that may be in the relation.

Here are the bounds on Int:

(r:Int [{(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)} :: {(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)}])

The lower bound comes first, then a ::, then the upper bound. Annoyingly, every integer gets assigned an object identifier, and so these tuples containing 0 through 15 are actually the representatives of integers -8 through 7. This is an artifact of how the solver process works.

Here's the bound on Person and its three sub-sigs:

(r:Person [{(16) (17) (18)} :: {(16) (17) (18) (19)}])
(r:Alice [{(16)} :: {(16)}])
(r:Bob [{(17)} :: {(17)}])
(r:Charlie [{(18)} :: {(18)}])

The lower bound on Person contains 3 object identifiers, because there must always be 3 distinct objects (representing our three one sigs). There's an object in the upper, but not the lower, bound, because that fourth person may or may not exist. Alice, Bob, and Charlie are exactly set to be those 3 always-present objects.

Finally:

(r:followers [(-> none none) :: {(16 16) (16 17) (16 18) (16 19) (17 16) (17 17) (17 18) (17 19) (18 16) (18 17) (18 18) (18 19) (19 16) (19 17) (19 18) (19 19)}])

The followers relation may be empty, and it may contain any of the 16 ordered pairs of potential Person objects.

Any tuple in the upper bound of a relation, that isn't also in the lower bound, gets assigned a boolean variable.

If a tuple isn't in the upper bound, it can never exist in an instance---it would always be assigned false---and so we needn't assign a variable.
If a tuple is in the lower bound, it must always exist in an instance---it would always be assigned true---and so we can again omit a variable.

Important: To minimize confusion between the object numbered $k$ and the Boolean variable numbered $k$ , which are not the same thing, from now on in these notes, numbers will correspond only to Boolean variables.

From Forge Constraints to Boolean Constraints

Once we know the set of Boolean variables we'll use, we can translate Forge constraints to purely Boolean ones via substitution.

The actual compiler is more complex than this, but here's an example of how a basic compiler might work. Suppose we have the constraint:

all p: Person | Alice in p.followers

There are no all quantifiers in Boolean logic. How can we get rid of it?

Think, then click!

An all is just a big and over the upper bound on Person. So we substitute (note here we're using $P erso n 3$ as if it were defined in our model, because it's a potential part of every instance):

Alice in Alice.followers
Alice in Bob.followers
Alice in Charlie.followers
(Person3 in Person) implies Alice in Person3.followers

There are similar rules for other operators.

Skolemization

Forge performs a process called Skolemization, named after the logician Thoralf Skolem, to convert select some quantifiers into supplemental relations.

The idea is: to satisfy a some quantifier, some object exists that can be plugged into the quantifier's variable to make the child formula true. Skolemization reifies that object witness into the model as a new constant. This:

makes debugging easier sometimes, since you can immediately see what might satisfy the quantifier constraint; and
sometimes aids in efficiency.

So if you see a relation labeled something like $x_some32783, it's one of these Skolem relations, and points to a value for a some quantified variable x. (Forge adds the numeric suffix to help disambiguate variables with the same name.)

Symmetry Breaking

Let's return to the original model:

abstract sig Person {
  followers: set Person
}
one sig Alice, Bob, Charlie extends Person {}
run {some followers} for exactly 3 Person

We decided it probably had $512$ instances. But does it really? Let's hit Next a few times, and count!

Actually, that sounds like a lot of work. Let's simplify things a bit more:

abstract sig Person {
  follower: one Person
}
one sig Alice, Bob, Charlie extends Person {}
run {} for exactly 3 Person

Now everybody has exactly one follower. There are still 9 potential tuples, but we're no longer storing sets of them for each Person. Put another way, every instance corresponds to an ordered triplet of Person objects (Alice's follower, Bob's followers, and Charlie's follower). There will be $3 \times 3 \times 3 = 3^{3} = 27$ instances. And indeed, if we click "Next" 26 times, this is what we find.

Now suppose we didn't name the 3 people, but just had 3 anonymous Person objects:

sig Person {
  follower: one Person
}
run {} for exactly 3 Person

The math is still the same: 27 instances. But now we only get 9 before hitting the unsat indicator of "no more instances".

What's going on?

Forge tries to avoid showing you the same instance multiple times. And, if objects are un-named and the constraints can never distinguish them, instances will be considered "the same" if you can arrive at one by renaming elements of the other. E.g.,

Person1 follows Person2
Person2 follows Person3
Person3 follows Person1

would be considered equivalent to:

Person2 follows Person1
Person3 follows Person2
Person1 follows Person3

since the individual Person atoms are anonymous. We call these instances isomorphic to each other, or symmetries of each other.

Formally, we say that Forge finds every instance "up to isomorphism". This is useful for:

increasing the quality of information you get from paging through instances; and
improving the runtime on unsatisfiable problems.

This process isn't always perfect: some equivalent instances can sneak in. Removing all equivalent instances turns out to sometimes be even more expensive than solving the problem. So Forge provides a best-effort, low cost attempt.

You can adjust the budget for symmetry breaking via an option:

option sb 0 turns off symmetry breaking; and
option sb 20 is the default.

If we turn off symmetry-breaking, we'll get the expected number of instances in the above run: 27.

Implementing Symmetry Breaking

Forge doesn't just filter instances after they're generated; it adds extra constraints that try to rule out symmetric instances. These constraints are guaranteed to be satisfied by at least one element of every equivalence class of instances. There's a lot of research work on this area, e.g., this paper from 2020.

Looking Ahead

After Spring break, we'll come back and talk about:

converting to CNF (Tseitin's Transformation); and
the algorithms that Boolean solvers use.

You will write your own solver, which you might even be able to plug in to Forge and compare its performance vs. the solvers we include by default.

2023: Satisfiability Solving

Welcome back! We'll spend this week on algorithms and data structures for solving boolean constraint problems. (Sometimes we'll call this "SAT"---short for "satisfiability".)

Logistics: Homework

Friday's homework will involve writing your own solver!

Solving SAT

Suppose I asked you to solve a boolean constraint problem. Maybe it comes from Forge, and maybe it doesn't. Here's an example, in "Java syntax":

x1 and (x1 implies x2)

Is there a satisfying instance for this constraint?

If yes, what is it, and how did you obtain it?
If no, how did you reach that conclusion?

Here's another:

(x1 or !x2) and (x1 or x2) and (x2 or !x1) and (!x2 or !x1)

Same question. Is there a satisfying instance for this constraint?

If yes, what is it, and how did you obtain it?
If no, how did you reach that conclusion?

Truth Tables

We could build the table of all possibilities, and use it to search like so:

`x1`	`x2`	`(x1 implies x2)`	`x1 and (x1 implies x2)`
T	T	T	T
T	F	F	F
F	T	T	F
F	F	T	F

We've found a solution! But we needed to build the entire set of possibilities to do so.

Can We Do Better?

If you take 1010, you'll learn that we don't actually know (at time of writing) whether it's possible to solve boolean satisfiability for arbitrary inputs without taking time exponential in the size of the input. This is one of the biggest unsolved questions, and certainly one of the most famous, in computer science.

But we shouldn't let that discourage us. Plenty of problems are worst case exponential (or even more difficult) and we solve them all the time.

Let's Try Anyway

Maybe we can do better sometimes. Let's just start, and see where we get to.

Pseudocode

The "code" in today's notes is pseudocode---it shouldn't be viewed as complete---rather, the goal is to motivate the core ideas coming up. You'll get a more complete stencil as part of your homework, including types.

A First Try

Here's a solution that recursively tries all combinations---sort of like building the truth table:

def solve(formula: BooleanFormula) -> bool:
    remaining = variables_in(formula) # get list of variables used in the formula
    if remaining.isEmpty(): 
        return simplify(formula)      # no variables left; simplify to true or false
    else:
        branch = remaining[0]                                     # guess based on first variable v
        true_result = solve(substitute(formula, branch, True))    # try v = true
        false_result = solve(substitute(formula, branch, False))  # try v = false
        return true_result || false_result                        # true if and only if _some_ guess worked

The function relies on two helpers:

simplify, which evaluates a formula with no variables. E.g., it turns True and False to just True.
substitute, which replaces a variable with a concrete boolean value. E.g., calling substitute(x1 and x2, x1, True) would produce True and x2.

Note, though, that this program doesn't actually build the entire table at any one time. It explores the entire set of possible instances, and so takes time worst-case exponential in the number of variables. But it doesn't need that much space, which is already an improvement.

However, its best case time is also exponential, which is a bit disappointing.

Maybe Luck Is With Us

The issue with the last solution is that it always explores, even if it doesn't need to. Instead, how about we only check one value at a time---if we find a True result for one specific substitution, we're done!

def solve(formula: BooleanFormula) -> bool:
    remaining = variables_in(formula)
    if remaining.isEmpty(): 
        return simplify(formula)
    else:
        branch = remaining[0]
        true_result = solve(substitute(formula, branch, True))
        if true_result:    # same as last version
            return True    # but allow early termination
        else: 
            false_result = solve(substitute(formula, branch, False))
            return false_result

Now, suddenly, the best-case and the worst-case aren't the same. The solver could be lucky: consider a formula like x1 and x2 and x3 and x4 and x5. The above algorithm only needs 5 recursive calls to return True; the previous one would need $2^{5}$ .

Of course, luck won't always be with us. Right? What's an example formula that would still need an exponential number of calls with the above code?

Think, then click!

How about: !x1 and !x2 and !x3 and ...? The first guess of True is always wrong for a formula like this.

Imposing Structure

Let's look back at our first example: x1 and (x1 implies x2). You may have taken advantage of some structure to figure out a satisfying instance for this formula. Namely, if we know that x1 holds, then we can propagate that knowledge into x1 implies x2 to deduce that x2 must hold---without needing to guess and try a value. There's nothing like this deduction in either of our programs so far.

The trouble is: if we're given an arbitrary formula, it's hard to pick out that (say) x1 is definitely true. But if we impose a little bit of structure on the input, it becomes easy in many cases. Let's do that now.

Conjunctive Normal Form

First, three definitions:

A literal is a variable or its negation. E.g., x1 and !x1 are both literals, but !!x1 and x1 or x2 are not.

A clause is a set of literals combined with "or"; you may sometimes hear this called a disjunction of literals.

A formula is said to be in conjunctive normal form (CNF) if it comprises a set of clauses that is combined with "and". We call it this because "conjunction" is just another name for "and"; a "normal form" is just a way of saying that a formula's shape follows a certain (often useful) structure.

If the input to our solver is in CNF, as the example formula x1 and (x1 implies x2) is (or rather, it would be if we'd wrote it equivalently as x1 and (!x1 or x2)), we can spot these opportunities to propagate knowledge quite quickly, just by searching for clauses of only one element.

A unit clause is a 1-element clause: a variable or its negation.

The empty clause is a 0-element clause, and is equivalent to False. Why is the empty clause equivalent to False?

Think, then click!

A clause is a big "or", and an "or" gives a set of ways the formula could evaluate to True. But an empty clause gives no such options!

From now on, we'll assume that we always have input in CNF. There's a trick to accomplish this, so let's keep focused on the solving, not on the conversion for now.

Using CNF: Unit Propagation

If our input was x1 and (!x1 or x2), and we'd stored it as a set, that would be {x1, (!x1 or x2)}. We can check for unit clauses in time linear in the number of clauses. And if there is one, we can see whether there are opportunities to propagate that knowledge. This idea is called unit propagation.

But how does that actually work?

Think, then click!

Suppose we've identified a unit clause, in this case x1. Then, for every other clause C in the set, we can check:

Does the clause C contain the same literal as the unit clause?
- If so, then C is subsumed by the unit clause: C must be satisfied given x1 holds! Delete the entire clause C.
Does the clause contain the opposite literal as in the unit clause?
- If so, then the clause C cannot possibly be made true by that opposite literal. Delete that literal from C.

Exercise: Here's a CNF formula. Solve it using unit propagation.

x1 and (x1 or x2 or x3) and (!x3 or x2 or !x4) and (!x2 or !x1) and x3

Think, then click!

First, we notice 2 unit clauses: x1 and x3. Then:

Propagate x1, which lets us remove a clause entirely, and simplify another: x1 and (!x3 or x2 or !x4) and !x2 and x3. But now we have a new unit clause, as a result of simplifying!
Propagating x3 gives us: x1 and (x2 or !x4) and !x2 and x3.
Propagating !x2 gives us x1 and !x4 and !x2 and x3.

This is starting to look suspiciously like a boolean instance. The unit clauses are assignments to boolean variables, and part of this process deduces new ones.

Exercise: Here's a CNF formula. Solve it using unit propagation.

x1 and (x1 or x2 or x3) and (!x3 or x2 or !x4)

Notice this formula is a strict subset of the earlier one. Because the earlier one is satisfiable, so is this one! But does unit propagation alone suffice to discover the satisfiability?

Think, then click!

No. Unit propagating x1 will produce x1 and (!x3 or x2 or !x4), but we need some other way---besides just unit propagation---of breaking down the larger clause that remains.

Fortunately, if we mix unit propagation with our prior "guessing" algorithm, we make significant progress.

Adding Unit Propagation

def solve(formula: Set[Clause]) -> bool:
    
    # are there any unit clauses? if so, propagate them
    # keep doing so until there are no more changes
    # Beware: mutation can be a source of bugs here...
    old_formula, formula = propagate_unit_clauses(formula)
    while formula <> old_formula:
        old_formula, formula = propagate_unit_clauses(formula)
    
    # Did we produce the empty clause? (We might represent the e.c. as {} or [] etc.)
    if EmptyClause in formula:
        return False
    # Do we otherwise have only unit clauses remaining? (What could go wrong in this check?) 
    elif formula == units_in(formula):    
        return True
    else:
        branch = remaining[0]
        # no longer substitute; instead, _add_ a unit clause to represent the "True" guess
        true_result = solve(formula + {branch})
        if true_result: 
            return True
        else: 
            # no longer substitute; instead, _add_ a unit clause to represent the "False" guess
            false_result = solve(formula + {!branch})
            return false_result

Again, the amount of information we get from unit propagation is subject to luck (or, rather, the dependencies between variables and clauses in the formula we're given).

Think: Why did we stop substituting?

In prior versions, we substituted True or False into the formula. Now, we're adding a unit clause to represent the guess instead. Why did we make this change? One reason is that it's easier to explicitly represent the flow of this algorithm via the addition of unit clauses, since half of it is about unit propagation. More reasons may appear as we continue.

This idea---a recursive, backtracking search paired with unit propagation---is the foundation of one of the most famous boolean solver algorithms: DPLL (named after the authors: Davis, Putnam, Logemann, and Loveland). DPLL still forms the core of how most modern SAT-solvers work (although there are more ideas and optimizations not yet incorporated, such as learning from failure and deciding which variable to branch on).

Returning more than a boolean

Returning just a boolean seems bad for the caller. After all, if Forge were using this solver, it would want the instance itself, not just that the instance exists. But there's another problem with returning a boolean: something related to testing. What is it?

Think, then click!

If we wanted to do property-based testing on the solver, returning True would force the testing predicate to solve the problem again. But if we returned an instance, the predicate would be much easier: just evaluate the formula in the instance, and see if it indeed satisfies the formula.

But how should we go about returning an instance, rather than True? To find out, let's actually look at the tree of recursive calls, and see what information we have, or could recover. Let's solve:

(!x1 or !x2) and (!x1 or !x3) and (x2 or x3)

In that bottom left call, how do we conclude "conflict?" In the other bottom call, how do we conclude "success"?

Think, then click!

Because there's one more unit-propagation step in each case that I haven't drawn! Unit propagating x3 when !x3 is present will produce the empty clause: False. And similarly in the other case: unit propagating x2 will eliminate the entire x2 or x3 clause.

Notice that every time we make a recursive call, there's an implicit set of assumptions involved. That is, there's always a partial instance of previously-selected guesses in effect at any point. We can make this explicit by adding a parameter to the function, and returning the guesses that produce a success:

# Note new input and output types
def solve(formula: Set[Clause], assumptions: Set[Literal]) -> Set[Literal]:    
    old_formula, formula = propagate_unit_clauses(formula)
    while formula <> old_formula:
        old_formula, formula = propagate_unit_clauses(formula)
        
    remaining = variables_in(formula)
    if remaining.isEmpty():
        if simplify(formula): 
            return assumptions
        else: 
            return False        
    else:
        branch = remaining[0]
        true_result = solve(formula + {branch}, assumptions + {branch : true})
        if true_result <> False: 
            return assumptions
        else: 
            false_result = solve(formula + {!branch}, assumptions + {branch : False})
            return false_result

Hold on...

Does this actually work? Let's think about testing. We can use PBT to test our solver! What do we need?

A generator for clause sets;
A predicate that tests whether an instance satisfies a clause set.

If we try this out, we may find a potential issue. Here's an example we might be concerned about (but notice how many contingencies are involved in generating it!):

(x1 or x2)
(!x1 or x3)

If our solver uses lexical order to decide what to branch on first, it's going to pick x1 before the others. And if it tries True first, we'll end up unit-propagating to:

x1 
x3

because the first original clause is subsumed by x1, and the second can be unit-propagated into. Yet, if we only return the assumptions, rather than all derived unit clauses, we'll return {x1: True}, not {x1: True, x3: True}. Beware!

Total or Partial Instances?

So far, our algorithm will avoid making assumptions if it doesn't need to. This is good from a performance perspective but bad if the caller expects a total instance that maps every variable to a boolean. But unit propagation can delete clauses, resulting in variables just disappearing from the problem. E.g., solving x1 and (x1 or x5).

If our goal is to produce a total solver (and it usually is---Forge, for example, needs values for every possible tuple in every relation, even if they aren't constrained at all), we'll need to post-process the result and pick arbitrary values for variables that have no value in the assumption set. Traditionally, we'll make this easier on ourselves by passing a 3rd argument to the solver: the number of variables.

Heuristics: Which Variables? Which Values?

There are a bunch of heuristics for picking variables to branch on, picking boolean values, etc. that are beyond the scope of this class. There is also a second brilliant idea that powers model solvers: learning from failure. In these solvers, reaching a conflict results in learning a "conflict clause" which is added to know the knowledge base, and the solver leverages this to backtrack further than one level of recursion if it's able.

If you're curious about how solvers are built, check out CSCI 2951-O.

Converting to CNF

How should we convert an arbitrary boolean formula to CNF?

Naive Approach: Distributivity

We could start by using the distributive law: x1 and (x2 or x3) is equivalent to (x1 or x2) and (x1 or x3).

So if we have:

(x1 and x2) or (x3 and x4)

We could convert it to CNF by applying the distributive law twice to get:

(x1 or x3) and (x1 or x4) and (x2 or x3) and (x2 and x4)

There's a fundamental problem here, though. What do you see?

Think, then click!

This process will increase formula size exponentially in the worst case. That's unacceptable. (Consider what happens if we keep adding terms to the above shape: making the inner and formulas bigger and adding more of them.)

This is again pretty disappointing: if there is no equivalent CNF available for some formulas that isn't exponentially bigger, how can we practically use the algorithm we just invented?

The Tseitin Transformation

There's something else we could do. If we can't find an equivalent CNF, maybe we could make a trade-off. To help us on the way, here are a couple of definitions:

Two formulas A and B over some set of boolean variables V are said to be logically equivalent if they are satisfied by the exact same instances.

Two formulas A and B are said to be equisatisfiable when A is satisfiable if and only if B is satisfiable. Note that equisatisfiability doesn't require A and B to use the same variable sets; the instances can be over different variables.

The Idea

Of course, mere equisatisfiability doesn't immediately guarantee any structural connection between the two formulas. For this to work, we need to be able to map a solution to problem B back to the original problem A. So maybe there's a way we could productively add variables to the original, retaining the meaning of the original variables, and still somehow avoid the exponential blowup in CNF conversion?

Let's look at that formula again: (x1 and x2) or (x3 and x4). View it as a boolean circuit.

What if we assigned a new variable for every internal node of the tree? We'd have a1 and a2 for the and nodes, and o1 for the or node. The formula is true if and only if the or node is, so we'd have a unit clause: o1 in the new formula.

But what makes o1 true? Here's a definition in 3 constraints:

a1 implies o1,
a2 implies o1, and
o1 implies (a1 or a2).

We can rewrite these in CNF:

!a1 or o1,
!a2 or o1, and
!o1 or a1 or a2.

The and nodes have similar definitions:

(x1 and x2) implies a1
a1 implies (x1 and x2)
(x3 and x4) implies a2
a2 implies (x3 and x4) which can be rewritten:
!x1 or !x2 or a1
!a1 or x1
!a1 or x2
!x3 or !x4 or a2
!a2 or x3
!a2 or x4

Together, these constraints are equisatisfiable with the original formula.

Moreover, they're something more than equisatisfiable. Just like we wanted, there's a useful relationship between the original variables and the new variables. We can always read off the values of x1, x2, x3, and x4 in a solution to the new constraints and get a solution to the original constraints. The values of the newly added variables express values for the intermediate nodes of the boolean circuit.

And that's how Forge's translation layer works.

Resolution Proofs

This document contains a two-class sequence on resolution proofs and how they relate to boolean solvers. This material will be directly useful in your 2nd SAT homework.

Context: Proofs vs. Instances

Almost all of our work in 1710 so far has focused on satisfiability: given constraints, how can they be satisfied? Our conversations have a character that puts instances first---how are they related, how can they be changed, how can a partial instance be completed, etc. In the field of logic, this is called a model-theoretic view.

But there's a second perspective, one that focuses on necessity, deduction, and contradiction---on justifying unsatisfiability with proof. Today we'll start exploring the proof-theoretic side of 1710. But we'll do so in a way that's immediately applicable to what we already know. In particular, by the time we're done with this week, you'll understand a basic version of how a modern SAT solver can return a proof of unsatisfiability. This proof can be processed to produce cores like those Forge exposes via experimental solver, core_minimization, etc. options.

We'll start simple, from CNF and unit propagation, and move on from there.

A Chain Rule For CNF

Suppose I know two things:

it's raining today; and
if it's raining today, we can't hold class outside.

I might write this more mathematically as the set of known facts: ${r, r ⟹ \neg c}$ , where $r$ means "rain" and $c$ means "class outside".

Symbols

Because the vast majority of the materials you might find on this topic are written using mathematical notation, I'm using $⟹$ for implies and $\neg$ for not. If you go reading elsewhere, you might also see $\land$ for and and $\lor$ for or.

Given this knowledge base, can we infer anything new? Yes! We know that if it's raining, we can't hold class outside. But we know it's raining, and therefore we can conclude class needs to be indoors. This intuition is embodied formally as a logical rule of inference called modus ponens:

$\frac{A , A ⟹ B}{B}$

The horizontal bar in this notation divides the inputs to the rule from the outputs. For any $A$ and $B$ , if we know $A$ , and we know $A ⟹ B$ , we can use modus ponens to deduce $B$ .

Does this remind you of something?

Modus ponens is very closely related to unit propagation. Indeed, that connection is what we're going to leverage to get our solvers to output proofs.

I like to think of rules of inference as little enzymes that operate on formula syntax. Modus ponens recognizes a specific pattern of syntax in our knowledge base and rewrites that pattern into something new. We can check rules like this for validity using a truth table:

$A$	$B$	$A ⟹ B$
0	0	1
0	1	1
1	0	0
1	1	1

In any world where both $A$ and $A ⟹ B$ are true, $B$ must be true.

Remember that implies and or are related!

In classical logic (our setting for most of 1710), $A ⟹ B$ is equivalent to $\neg A \lor B$ . Either $A$ is false (and thus no obligation is incurred), or $B$ is true (satisfying the obligation whether or not it exists). This will be very important soon.

Beyond Modus Ponens

Suppose we don't have something as straightforward as ${r, r ⟹ \neg c}$ to work with. Maybe we only have:

if it's raining today, we can't hold class outside; and
if Tim is carrying an umbrella, then it's raining today.

That is, we have a pair of implications: ${u ⟹ r, r ⟹ \neg c}$ . We cannot conclude that it's raining from this knowledge base, but we can still conclude something: that if Tim is carrying an umbrella, then we can't hold class outside. We've learned something new, but it remains contingent: $u ⟹ \neg c$ .

This generalization of modus ponens lets us chain together implications to generate new ones:

$\frac{A ⟹ B , B ⟹ C}{A ⟹ C}$

Like before, we can check it with a truth table. This time, there are 8 rows because there are 3 inputs to the rule:

$A$	$B$	$C$	$A ⟹ B$	$B ⟹ C$	$A ⟹ C$
0	0	0	1	1	1
0	0	1	1	1	1
0	1	0	1	0	1
0	1	1	1	1	1
1	0	0	0	1	0
1	0	1	0	1	1
1	1	0	1	0	0
1	1	1	1	1	1

Note: This rule works no matter what form $A$ , $B$ , and $C$ take, but for our purposes we'll think of them as literals (i.e., variables or their negation).

Propositional Resolution

The resolution rule is a further generalization of what we just discovered. Here's the idea: because we can view an "or" as an implication, we should be able to apply this idea of chaining implications to clauses.

First, let's agree on how to phrase clauses of more than 2 elements as implications. Suppose we have a clause $(l_{1} \lor l_{2} \lor l_{3})$ . Recall that:

a clause is a big "or" of literals;
a literal is either a variable or its negation; and
$\lor$ is just another way of writing "or".

We might write $(l_{1} \lor l_{2} \lor l_{3})$ as an implication in a number of ways, e.g.:

$(l_{1} \lor l_{2} \lor l_{3}) \equiv (\neg l_{1} ⟹ (l_{2} \lor l_{3}))$
$(l_{1} \lor l_{2} \lor l_{3}) \equiv (\neg l_{2} ⟹ (l_{1} \lor l_{3}))$
$(l_{1} \lor l_{2} \lor l_{3}) \equiv (\neg l_{3} ⟹ (l_{1} \lor l_{2}))$
$(l_{1} \lor l_{2} \lor l_{3}) \equiv ((\neg l_{1} \land \neg l_{2}) ⟹ l_{3})$
$(l_{1} \lor l_{2} \lor l_{3}) \equiv ((\neg l_{1} \land \neg l_{3}) ⟹ l_{2})$
$(l_{1} \lor l_{2} \lor l_{3}) \equiv ((\neg l_{2} \land \neg l_{3}) ⟹ l_{1})$

So if we have a large clause, there may be more ways of phrasing it as an implication than we'd want to write down. Instead, let's make this new rule something that works on clauses directly.

How would we recognize that two clauses can be combined like the above? Well, if we see something like these two clauses:

$(l_{1} \lor l_{2})$ ; and
$(\neg l_{1} \lor l_{3})$ then, if we wanted to, we could rewrite them as:
$(\neg l_{2} ⟹ l_{1})$ ; and
$(l_{1} ⟹ l_{3})$ and then apply the rule above to get:
$(\neg l_{2} ⟹ l_{3})$ . We could then rewrite the implication back into a clause:
$(l_{2} \lor l_{3})$ .

Notice what just happened. The two opposite literals have cancelled out, leaving us with a new clause containing everything else that was in the two original clauses.

$\frac{( A \lor B ) , ( \neg B \lor C )}{( A \lor C )}$

This is called the binary propositional resolution rule. It generalizes to something like this (where I've labeled literals in the two clauses with a superscript to tell them apart):

$\frac{( l _{1}^{1} \lor l _{2}^{1} \lor ... \lor l _{n}^{1} ) , ( \neg l _{1} \lor l _{1}^{2} \lor ... \lor l _{m}^{2} )}{( l _{2}^{1} \lor l _{n}^{1} \lor l _{1}^{2} \lor ... \lor l _{m}^{2} )}$

This rule is a very powerful one. In particular, since unit propagation is a basic version of resolution (think about why!) we'll be able to use this idea in our SAT solvers to prove why an input CNF is unsatisfiable.

Resolution Proofs

What is a proof? For our purposes today, it's a tree where:

each leaf is a clause in some input CNF; and
each internal node is an application of the resolution rule to two other nodes.

Here's an example resolution proof that shows the combination of a specific 4 clauses is contradictory:

Proof trees are data!

This tree is not a paragraph of text, and it isn't even a picture written on a sheet of paper. It is a data structure, a computational object, which we can process and manipulate in a program.

Soundness and Completeness

Resolution is a sound proof system. Any correct resolution proof tree shows that its root follows unavoidably from its leaves.

Resolution is also refutation complete. For any unsatisfiable CNF, there exists a resolution proof tree whose leaves are taken from that CNF and whose root is the empty clause.

Resolution is not complete in general; it can't be used to derive every consequence of the input. There may be other consequences of the input CNF that resolution can't produce. For a trivial example of this, notice that we cannot use resolution to produce a tautology like $(l_{1} \lor \neg l_{1}$ from an empty input, even though a tautology is always true.

Getting Some Practice

Here's a CNF:

(-1, 2, 3)
(1)
(-2, 4)
(-3, 5)
(-4, -2)
(-5, -3)

Can you prove that there's a contradiction here?

Prove, then click!

Let's just start applying the rule and generating everything we can...

Wow, this is a lot of work! Notice two things:

we're going to end up with the same kind of 4-clause contradiction pattern as in the prior example;
it would be nice to have a way to guide generation of the proof, rather than just generating every clause we can. An early form of DPLL did just that, but the full algorithm added the branching and backtracking. So, maybe there's a way to use the structure of DPLL to guide proof generation...

Notice that one of the resolution steps you used was, effectively, a unit propagation step. This should help motivate the idea that unit propagation (into a clause were the unit is negated) is a special case of resolution:

$\frac{( A ) , ( \neg A \lor ... )}{( ... )}$

You might wonder: what about the other aspect of unit propagation---the removal of clauses entirely when they're subsumed by others?

Think, then click!

This is a fair question! Resolution doesn't account for subsumption because proof is free to disregard clauses it doesn't need to use. So, while subsumption will be used in any solver, it's an optimization.

One variable at a time!

Don't try to "run resolution on multiple variables at once". To see why not, try resolving the following two clauses:

(1, 2, 3)
(-1, -2, 4)

It might initially appear that these are a good candidate for resolution. However, notice what happens if we try to resolve on both 1 and 2 at the same time. We would get:

(3, 4)

which is not a consequence of the input! In fact, we've mis-applied the resolution rule. The rule says that if we have $(A \lor B)$ and we have $(\neg A \lor C)$ we can deduce $(B \lor C)$ . These letters can correspond to any formula---but they have to match! If we pick (say) $1 \lor 2$ for $A$ , then we need a clause that contains $\neg A$ , which is $\neg (1 \lor 2)$ , but that's not possible to see in clause; a clause has to be a big "or" of literals only. So we simply cannot run resolution in this way.

Following the rule correctly, if we only resolve on one variable, we'd get something like this:

(2, -2, 3, 4)

which is always true, and thus usless in our proof, but at least not unsound. So remember:

always resolve on one variable at a time; and
if resolution produces a result that's tautologous, you can just ignore it.

Learning From Conflicts

Let's return to that CNF from before:

(-1, 2, 3)
(1)
(-2, 4)
(-3, 5)
(-4, -2)
(-5, -3)

Instead of trying to build a proof, let's look at what your DPLL implementations might do when given this input. I'm going to try to sketch that here. Your own implementation may be slightly different. (That doesn't necessarily make it wrong!) If you've started the SAT1 assignment, then open up your implementation as you read, and follow along. If not, note down this example.

Called on: [(-1, 2, 3), (1), (-2, 4), (-3, 5), (-4, -2), (-5, -3)]
Unit-propagate (1) into (-1, 2, 3) to get (2, 3)
There's no more unit-propagation to do, so we need to branch. We know the value of 1, so let's branch on 2 and try True first.
Called on: [(2), (2, 3), (1), (-2, 4), (-3, 5), (-4, -2), (-5, -3)]
Unit-propagate (2) into (-2, 4) to get (4).
Remove (2, 3), as it is subsumed by (2).
Unit-propagate (4) into (-4, -2) to get (-2).
Remove (-2, 4), as it is subsumed by (4).
Unit-propagate (-2) into (2) to get the empty clause.

Upon deriving the empty clause, we've found a contradiction. Some part of the assumptions we've made so far (here, only that 2 is True) contradicts the input CNF.

If we wanted to, we could learn a new clause that applies or to ("disjoins") all the assumptions made to reach this point. But there might be many assumptions in general, so it would be good to do some sort of fast blame analysis: learning a new clause with 5 literals is a lot better than learning a new clause with 20 literals!

Here's the idea: we're going to use the unit-propagation steps we recorded to derive a resolution proof that the input CNF plus any current assumptions lead to the empty clause. We'll then reduce that proof into a "conflict clause". This is one of the key ideas behind a modern improvement to DPLL: CDCL, or Conflict Driven Clause Learning. We won't talk about all the tricks that CDCL uses here, nor will you have to implement them. If you're curious for more, consider shopping CSCI 2951-O. For now, it suffices to be aware that reasoning about why a conflict has been reached can be useful for performance.

In the above case, what did we actually use to derive the empty clause? Let's work backwards. We'll try to produce a linear proof where the leaves are input clauses or assumptions, and the internal nodes are unit-propagation steps (remember that these are just a restricted kind of resolution). We ended with:

Unit-propagate (-2) into (2) to get the empty clause.

The (2) was an assumption. The (-2) was derived:

Unit-propagate (4) into (-4, -2) to get (-2).

The (-4, -2) was an input clause. The (4) was derived:

Unit-propagate (2) into (-2, 4) to get (4).

The (-2, 4) was an input clause. The (2) was an assumption.

Now we're done; we have a proof:

Using only those two input clauses, we know that assuming (2) won't be productive, and (because we have a proof) we can explain why. And, crucially, because the proof is a data structure, we can manipulate it if we need to.

Explaining Unsatisfiable Results

This is promising: we have a piece of the overall proof of unsatisfiability that we want. But we can't use it alone as a proof that the input is unsatisfiable: the proof currently has an assumption (2) in it. We'd like to convert this proof into one that derives (-2)---the negation of the assumption responsible---from just the input.

Let's rewrite the (contingent, because it relies on an assumption the solver tried) proof we generated before. We'll remove assumptions from the tree and recompute the result of every resolution step, resulting in a proof of something weaker that isn't contingent on any assumptions. To do this, we'll recursively walk the tree, treating inputs as the base case and resolution steps as the recursive case. In the end, we should get something like this:

Notice that we need to re-run resolution after processing each node's children to produce the new result for that node. This suggests some of the structure we'll need:

If one child is an assiumption, then "promote" the other child and use that value, without re-running resolution. (Think: Why is this safe to this? It has to do with the way DPLL makes guesses.)
Otherwise, recur on children first, then re-run resolution on the new child nodes, then return a new node with the new value.

Implementation Advice

Break down these operations into small helper functions, and write test cases for each of them. Really! It's easy for something to go wrong somewhere in the pipeline, and if your visibility into behavior is only at the top level of DPLL, you'll find it much harder to debug issues.

Remember that you can use PBT on these helpers as well. The assignment doesn't strictly require it, but it can be quite helpful. Use the testing tools that are available to you; they'll help find bugs during development.

Takeaway

This should illustrate the power of being able to treat proofs as just another data structure. Resolution proofs are just trees. Because they are trees, we can manipulate them programatically. We just transformed a proof of the empty clause via assumptions into a proof of something else, without assumptions.

This is what you'll do for your homework.

Combining sub-proofs

Suppose you ran DPLL on the false branch (assuming (-2)) next. Since the overall input is unsatisfiable, you'd get back a proof of (2) from the inputs. And, given a proof tree for (-2) and a proof tree for (2), how could you combine them to show that the overall CNF is unsatisfiable?

Think, then click!

Just combine them with a resolution step! If you have a tree rooted in (2) and another tree rooted in (-2), you'd produce a new resolution step node with those trees as its children, deriving the empty clause.

Property-Based Testing (the unsatisfiable case)

Given one of these resolution proofs of unsatisfiability for an input CNF, you can now apply PBT to your solver's False results, because they are no longer just False.

What properties would you want to hold?

Think, then click!

For the tree to prove that the input is unsatisfiable, you'd need to check:

the internal nodes of the tree are valid resolution steps;
the leaves of the tree are taken only from the input clauses; and
the root of the tree is the empty clause.

Pre-Registration!

Pre-registration is upon us soon! Some related courses include the following. I'll restrict myself to those about, or closely related to logic, formal methods, or programming languages, and which can count for the CSCI concentration as of the present version of the Handbook.

CSCI 1010 (theory of computation)
CSCI 1600 (real-time and embedded software)
CSCI 1730 (programming languages)
CSCI 1951X (formal proof and verification)
PHIL 1630 (mathematical logic)
PHIL 1880 (advanced deductive logic)
PHIL 1855 (modal logic)

There are many other great courses---some of which I might argue should also count as upper-level courses for the CSCI concentration, or which are only taught in the Spring semester. For instance, PHIL 1885 covers incompleteness and would be an interesting counterpoint to a more CSCI-focused course on computability.

Satisfiability Modulo Theories (SMT)

Livecode link

Boolean solvers are powerful, but not very expressive. If you want to use them to solve a problem involving (e.g.) arithmetic, you need to encode that idea with booleans. Forge does this with a technique called "bit-blasting": one boolean variable per bit in a fixed bitwidth, along with formulas that build boolean adders, multipliers, etc. as needed. This works well for small examples, but can quickly run into performance issues---and if you need actual mathematical integers (to say nothing of real numbers!) you're out of luck.

An SMT solver is a SAT solver that can handle various domain-specific concepts beyond boolean logic. Hence "modulo theories": it's satisfiability, but with the addition of (say) the "theory" of linear integer arithmetic. From a certain point of view, Forge is an "SMT" solver, because it includes concepts like relations and bit-vector integers. But this isn't usually how people understand the term these days.

Theories

In the logic community, theory is just another word for set of constraints. So when we say "the theory of linear integer arithmetic" we mean the axioms that define the domain of linear integer arithmetic.

The reason is that SMT solvers can be either "eager" or "lazy". An eager solver translates all the domain-specific constraints to boolean logic and then uses a boolean solver engine---this is Forge's approach. In contrast, a lazy solver actually implements domain-specific algorithms and integrates those with a purely-boolean solver core. Most modern SMT solvers tend to be lazy, and so they can benefit from clever domain algorithms.

Here are some common domains that SMT solvers tend to support:

uninterpreted functions with equality;
integer arithmetic (linear nearly always, non-linear sometimes);
real arithmetic;
bit vectors;
arrays; and
datatypes.

Some slightly less common domains include:

relations;
lists; and
strings.

And of course there are many others. The solver we'll use this week supports many of these, but not all.

A Key Difference

Most SMT solvers don't have "bounds" in the same way Forge does. You can declare a datatype that's bounded in size, but the addition of domains like mathematical integers or lists means that unless you're working in a very restricted space, the set of possible objects is infinite. This can cause some confusion versus what we're used to.

What does it mean to say "For all $x$ of type $A$ , $P (x)$ is true?" In Forge, $A$ always has an upper bound, and so the quantifier can always be converted to a big, but finite, "and" constraint. But suppose the type is actual mathematical integers? There are infinitely many integers, which means the solver can't convert the quantifier to a (finite) boolean constraint. This is such an important factor in designing SMT solvers that SMT literature often refers to universal quantification as just "quantification".

Universal quantification

Try to avoid universal quantification ("all") in SMT if you can. You can't always avoid it, but make sure you really need it to express your goals.

Even without universal quantifiction, a problem might not necessarily be solvable. We'll talk more about why on Wednesday; for now, just be aware that the solver might return a third result in addition to sat and unsat: unknown (the solver gave up or ran out of time).

The Z3 Solver

Setup

We'll be using the Python bindings for the Z3 solver, available here. You can also install via pip:

pip3 install z3-solver

To update to the latest version of the solver, you can run:

pip3 install z3-solver --upgrade

Another great solver is CVC5. Although we won't use it in class, it supports some things that Z3 doesn't (and vice versa). For instance: relations!

Boolean

We've still got the boolean-logic capabilities of a normal SAT solver:

def demoBool():
        # Create a new solver
        s = Solver()

        # declare some boolean *solver* variables
        p, q = Bools('p q')         
        
        s.add(Or(p, q))
        if s.check() == sat:        
            print(s.model()) # "model" ~= "instance" here :/
        
        # (Think: how would we get a different instance?)

        # getting at pieces of a model for programmatic use
        print(s.model().evaluate(p)) # can pass a formula

Terminology: model

Different communities use different terminology. We use the word model to describe the definitions and constraints you use to model a system, just like an automotive engineer might build a computer model of a car. This is generally what the software-engineering community means by the word. The logic community, on the other hand, uses model to mean the same thing that we call an instance in Forge: the valuation that either satisfies or dissatisfies a set of constraints. There are good historical reasons for this, but for now, just be aware that Z3 will use the word "model" like a logician, not a software engineer.

Uninterpreted Functions And Integer Inequalities

If a symbol (function, relation, constant, ...) is interpreted, then its meaning is encoded via constraints built into the solver. In Forge, we'd say that:

add is an interpreted function, since Forge assigns it a meaning innately; but
relations you add as sig fields are uninterpreted, since absent constraints you add yourself, Forge treats their values as arbitrary.

Functions, not relations: With some exceptions, SMT solvers usually focus on functions, not relations. This is another reason for Froglet to be about functions: they're more useful as a foundation in other tools!

Here is a Z3 function that demonstrates the difference between interpreted and uninterpreted functions:

def demoUninterpreted():
    s = Solver()
    # ; Ints, UNINTERPRETED Functions (think of as like relations in Alloy)        
    a, b = Ints('a b')  
    f = Function('f', IntSort(), IntSort())
    s.add(And(b > a, f(b) < f(a)))        
    if s.check() == sat:        
        print(s.model()) 
    print(s.model().evaluate(f(a)))
    print(s.model().evaluate(f(b)))
    print(s.model().evaluate(f(1000000)))

Arithmetic

Let's try something that involves arithmetic, and also explore how the solver handles real numbers vs. integers.

We'll use a universal quantifier here, because in this case they are exceptionally useful. Notice that how we frame the problem can drastically affect how Z3 performs---in cases like this, it can often automatically handle the quantifier. But not always.

 # Real numbers
def demoReals():
    s = Solver()
    x = Real('x') # contrast to: Int('x')  
    s.add(x*x > 4)
    s.add(x*x < 9)
    result = s.check()
    if result == sat:
        print(s.model())    
    else: 
        print(result)

def demoFactoringInt():
    s = Solver()

    # (x - 2)(x + 2) = x^2 - 4
    # Suppose we know the RHS and want to find an *equivalent formula* LHS. 
    # We will solve for the roots:
    # (x - ROOT1)(x + ROOT2) = x^2 - 4

    xi, r1i, r2i = Ints('x root1 root2') # int vars

    # Note: don't use xi ** 2 -- gives unsat?
    s.add(ForAll(xi, (xi + r1i) * (xi + r2i) == (xi * xi) - 4  ))
    result = s.check()
    if result == sat:
        print(s.model())    
    else: 
        print(result)

    s.reset()   

    # Try another one: 
    # (x + 123)(x - 321) = x^2 - 198x - 39483
    s.add(ForAll(xi, (xi + r1i) * (xi + r2i) 
                     == (xi * xi) + (198 * xi) - 39483))
    result = s.check()
    if result == sat:
        print(s.model())    
    else: 
        print(result)
    # Note how fast, even with numbers up to almost 40k. Power of theory solver.

def demoFactoringReals():
    s = Solver()
    x, r1, r2 = Reals('x root1 root2') # real number vars
    # ^ As before, solve for r1, r2 because they are unbound in outer constraints
    #   x is quantified over and therefore not a var to "solve" for

    # (x + ???)(x + ???) = x^2 - 198x - 39484         
    s.add(ForAll(x, (x + r1) * (x + r2) 
                     == (x * x) + (198 * x) - 39484))
    result = s.check()
    if result == sat:
        print(s.model())    
    else: 
        print(result)

def demoFactoringRealsUnsat():
    # Here's how to start using cores in Z3 if you want, but
    # see the docs -- it's a bit more annoying because you need to create 
    # new boolean variables etc.

    #s.set(unsat_core=True) # there are so many options, at many different levels
    # use s.assert_and_track; need to give a boolean 
    # see: https://z3prover.github.io/api/html/classz3py_1_1_solver.html#ad1255f8f9ba8926bb04e1e2ab38c8c15 

    # Now, for the demo!

    # Note e.g., x^2 - 2x + 5 has no real roots (b^2 - 4ac negative)
    s.add(ForAll(x, (x + r1) * (x + r2) 
                     == (x * x) - (2 * x) + 5))

    result = s.check() 
    if result == sat:
        print(s.model())    
    else: 
        print(result)            

def coefficients():
    s = Solver()
    x, r1, r2, c1, c2 = Reals('x root1 root2 c1 c2') # real number vars        
    s.add(ForAll(x, ((c1*x) + r1) * ((c2*x) + r2) == (2 * x * x)))
    result = s.check()
    if result == sat:
        print(s.model())    
    else: 
        print(result)

Another Demo: N-Queens

   def nQueens(numQ):
        s = Solver()
        # Model board as 2d list of booleans. Note the list is *Python*, booleans are *Solver*
        cells = [ [ z3.Bool("cell_{i}{j}".format(i=i,j=j)) 
                    for j in range(0, numQ)] 
                    for i in range(0, numQ) ]
        #print(cells)
        
        # a queen on every row
        queenEveryRow = And([Or([cells[i][j] for j in range(0, numQ)]) for i in range(0, numQ)])
        #print(queenEveryRow) # for demo only
        s.add(queenEveryRow)

        # for every i,j, if queen present there, implies no queen at various other places
        # Recall: queens can move vertically, horizontally, and diagonally.
        # "Threaten" means that a queen could capture another in 1 move. 
        queenThreats = And([Implies(cells[i][j], # Prefix notaton: (And x y) means "x and y".
                                    And([Not(cells[i][k]) for k in range(0, numQ) if k != j] +
                                        [Not(cells[k][j]) for k in range(0, numQ) if k != i] +
                                        # Break up diagonals and don't try to be too smart about iteration
                                        [Not(cells[i+o][j+o]) for o in range(1, numQ) if (i+o < numQ and j+o < numQ) ] +
                                        [Not(cells[i-o][j-o]) for o in range(1, numQ) if (i-o >= 0 and j-o >= 0) ] +
                                        # flipped diagonals
                                        [Not(cells[i-o][j+o]) for o in range(1, numQ) if (i-o >= 0 and j+o < numQ) ] +
                                        [Not(cells[i+o][j-o]) for o in range(1, numQ) if (i+o < numQ and j-o >= 0) ]
                                        ))
                           for j in range(0, numQ)
                           for i in range(0, numQ)])
        #print(queenThreats) # for demo only
        s.add(queenThreats)

        if s.check() == sat:
            for i in range(0, numQ):
                print(' '.join(["Q" if s.model().evaluate(cells[i][j]) else "_" for j in range(0, numQ) ]))
        else: 
            print("unsat")

if __name__ == "__main__":
    nQueens(4)

Wednesday

Today we'll talk about how SMT solvers work. We'll sketch:

the boolean skeleton of an SMT problem;
the intuition behind the congruence closure argument;
the Goldbach conjecture as an example; and
undecidability.

What's Going On In The Solver?

Modern SMT-solvers tend to be lazy (a technical term): they use a base boolean solver, and call out to domain-specific algorithms ("theory solvers") when needed. This is how Z3 manages to be so fast at algebraic reasoning.

But what shape of constraints do these theory-solvers accept? And how does an SMT solver manage to provide that? Let's look at one potential theory: systems of linear inequalities.

An Example Theory-Solver: Linear Inequalities

If I gave you a system of linear inequalities, like this:

x + y < 3
x < 2y

Could you find a solution? Probably (at least, if you Googled or had an old algebra textbook to hand). Or, if you're like me, you might enter it into Wolfram Alpha:

But suppose we added a bunch of boolean operators into the mix. Now what? You can't solve a "system" if the input involves "or".

SMT solvers use a technique to separate out the boolean portion of a problem from the theory-specific portion. For example, if I wrote: x > 3 or y > 5, the solver will convert this to a boolean skeleton, replacing all the theory terms with boolean variables: T1 or T2, where T1 means x > 3 and T2 means y > 5. It can now find an assignment to that boolean skeleton with a normal SAT-solver. And every assignment is implicitly conjunctive: there's no "or"s left!

Suppose the solver finds T1=True, T2=True. Then we have the system:

x > 3
y > 5

If, instead, the solver found T1=False, T2=True, we'd have the system:

x <= 3
y > 5

Notice that each of these solutions to the boolean skeleton provide a system of inequalities that we could solve with algebra. We'll call this the theory solver; it can solve very restricted kinds of problem (like linear inequalities), but solve them intelligently.

This idea lets us implement a very basic SMT solver by following these 3 steps:

(1) get another instance that satisfies the boolean skeleton; and then
(2) solve the resulting system with algebra.
(3) If the result of (2) is unsat, or another solution is desired, restart from (1).

Modern SMT solvers have more integration between the boolean and theory solvers, but that's outside the scope of this course.

Another Example Theory-Solver: Uninterpreted Functions With Equality

Here are some constraints:

! ( f(f(f(a))) != a or f(a) = a ) 
f(f(f(f(f(a))))) = a or f(f(a)) != f(f(f(f(a))))

Are these constraints satisfiable? Does there exist some value of a and f such that these constraints are satisfied?

As before, we'll convert these to a boolean skeleton:

!(T1 or T2)
T3 or T4

An Unproductive Assignment

This CNF is satisfied by the assignment T1=False, T2=False, T3=True, T4=False, which gives us the theory problem:

f3(a) = a
f(a) != a
f5(a) = a
f2(a) = f4(a)

(I won't even need that fourth constraint to show this skeleton-assignment won't work.)

This system of equalities can be solved via an algorithm called congruence closure. It goes something like this. First, collect all the terms involved in equalities and make

Now draw undirected edges between the terms that the positive equality constraints force to be equivalent. Since we're being told that f3(a) = a, we'd draw an edge between those nodes. And similarly between f5(a) and a:

But equality is transitive! So we have learned that f5(a) and f3(a) are equivalent. (This is the "closure" part of the algorithm's name.)

From there, what else can we infer? Well, if f3(a) is the same as a, we can substitute a for f(f(f(a))) inside f(f(f(f(f(a))))), giving us that f(f(a)) (which we're calling f2(a) for brevity) is equivalent to a as well, since f5(a) = a. And since equality is transitive, f2(a) equals all the other things that equal a.

And if f2(a) = a, we can substitute a for f(f(a)) within f(f(f(a))), to get f(a) = a.

But this contradicts the negative equality constraint f(a) != a. So we've found a contradiction.

Now we've invented a second theory solver. The more of these we have, the more domains the solver can handle intelligently. (A natural question is: will all of these solvers work well together? The answer is not always, but we won't need to worry about that this semester.)

A Productive Assignment

Another way of solving the boolean skeleton is with the assignment T1=False, T2=False, T3=False, T4=True, which gives us the theory problem:

f3(a) = a
f(a) != a
f5(a) != a
f2(a) != f4(a)

We'd proceed similarly. But this time we don't have many unavoidable equalities between terms: f3(a) is linked with a, and we could say that f6(a) = a via substitution---if we cared about f6(a). But it's not necessary for any of the inequalities to be violated.

Return to Decidability

There's a famous unsolved problem in number theory called Goldbach's conjecture. It states:

Every integer greater than 2 can be written as the sum of three primes.

1 isn't prime!

We generally consider 1 to be a non-prime number nowadays. But in the original formulation of this conjecture, it was meant to be. There are some alternative formulations in the article linked above, e.g., that every even natural number greater than two is the sum of two primes.

This is simple to state, and it's straightforward to express to Z3 or other SMT solvers. Yet, we don't know (at time of writing) whether or not the conjecture holds for all integers greater than 2. Mathematicians have looked for small (and not so small) counterexamples, and haven't found one yet.

That illustrates a big problem. To know whether Goldbach's conjecture is false, we just need to find an integer greater than 2 that cannot be written as the sum of 3 primes. Here's an algorithm for disproving the conjecture:

    for i in Integers:
        for p1, p2, p3 in PrimesUpTo(i):
            if i = p1 + p2 + p3: 
                continue;
        return i;

If Goldbach's conjecture is wrong, this computation will eventually terminate and give us the counterexample.

But what about the other direction? What if the conjecture is actually true? Then this computation never terminates, and never gives us an answer. We never learn that the conjecture is true, because we're never done searching for counterexamples.

Now, just because this specific algorithm isn't great doesn't mean that a better one might not exist. Maybe it's possible to be very smart, and search in a way that will terminate with either true or false.

Except that it's not always possible. CSCI 1010 talks about this a lot more, but I want to give you a bit of a taste of the ideas now that we're nearing the end of 1710.

Undecidability

I want to tell you a story---with only some embellishment.

First, some context. How do we count things? Does $1, 2, 3$ have the same number of elements as $A, B, C$ ? What about $N$ vs. $N \cup {B ro w n U}$ ? If we're comparing infinite sets, then it seems reasonable to say that they have the same size if we can make a bijection between them: a 1-1 mapping.

But then, counter-intuitively, $N$ and $N \cup {B ro w n U}$ are the same size. Why? Here's the idea, which is often called Hilbert's Hotel: suppose you work at the front desk of a hotel with a room for every natural number. And, that night, every room is occupied. A new guest arrives. Can you find room for them?

Think, then click!

Yes! Here's how. For every room

i

, tell that guest to move into room

i + 1

. You'll never run out of rooms, and room 0 will be free for the new guest. Every guest will need to do a finite amount of work, but assuming we can send this message to everyone at once, it works out.

So, it's the late 1800's. Hilbert's Hotel (and related ideas) have excited the mathematical world. Indeed, can we use this trick to show that every infinite set is the same size? Are all infinities one, in a philosophical sense?

At this time, there was a non-famous but moderately successful mathematician named Georg Cantor. He was in his 40's when he made a groundbreaking discovery---contradicting the conventional wisdom (thanks, Hardy) that young mathematicians do all the interesting work. Cantor proved that the power set of $N$ , that is, the set of subsets of $N$ , must be strictly larger than $N$ .

There is pandemonium. There is massive controversy. But, later mathematicians said that his ideas came 100 years before the community was ready for them. Hilbert himself actually said, later, that "No one shall drive us from the paradise Cantor has created for us."

How did Cantor prove this? By contradiction. Assume you're given a bijection between a set $N$ and its power set. Now, this bijection can be thought of as an infinite table, with subsets of $N$ as rows and elements of $N$ as columns. The cells contain booleans: true if the subset contains the element, and false if it doesn't.

Set	0	1	...
{}	N	N	...
{0}	Y	N	...
{0, 1}	Y	Y	...
...	...	...	...

Cantor showed that there must always be a subset of $N$ that isn't represented as a row in the table. That is, such a bijection cannot exist. Even with the very permissive definition of "same size" we use for infinite sets, there are still more subsets of the natural numbers than there are natural numbers.

What is the subset that can't be represented as a row in the table?

Think, then click!

Read off the diagonal from the top-left onward, and invert each boolean. In the table above, the set would contain both 0 and 1 (because those first two rows do not contain them, respectively) and so on.

This technique is called "Cantor diagonalization".

Why does this matter to US? Let me ask you two questions:

QUESTION 1: How many syntactically-valid Java program source files are there?

Think, then click!

There are infinitely many. But let's be more precise. A the source code of a program is a finite text file. The size may be unbounded, but each specific file is finite. And the alphabet used for each character is also finite (let's say between 0 and 255, although that isn't always entirely accurate).

Thus, we can think of a program source file as a finite sequence of numbers between 0 and 255. This is the same as representing a natural number in base 256. There are as many Java program source files as there are natural numbers.

QUESTION 2: How many mathematical functions from non-negative integer inputs to bool outputs are there, assuming your language has unbounded integers?

Think, then click!

Each such function returns true or false for any given non-negative integer. In effect, it is defining a specific set of these. There are as many such mathematical functions as there are sets of natural numbers.

What is our conclusion?

Try as you might, it is impossible to express all functions of these in any programming language where program texts are finite. So we know that programs in any language must be unable to express some things (indeed, most things). But is there anything that no language can express? Maybe all the things that are inexpressible are things that nobody actually needs or cares about. That would be comforting.

Unfortunately, there are plenty of important ideas that can't be expressed in any finite program. If you're curious about this, you might investigate CSCI 1010. I'm also happy to talk more about it offline. The following (very rough!) notes are meant to sketch one of the most famous problems in this area.

Another Story (OUTLINE)

It's the early 1900's. Hilbert and others: IS MATHEMATICS MECHANIZABLE?

In 30's: Church, Turing, Godel: "No. At least not completely."

Why? A few reasons. Here's a challenge. Write for me a program h(f, v) that accepts two arguments:

another program; and
an input to that program.

It must:

always terminate;
return true IFF f(v) terminates in finite time;
return false IFF f(v) does not terminate in finite time

Suppose h exists, and can be embodied in our language. Then consider this program.

def g(x):
  if h(g, x): # AM I GOING TO HALT? (remember h always terminates)
    while(1); # NUH UH IM NOT!
  else:       
    return;   # HAHAHAHAHA YES I AM

Argh! Assuming that our halting function h is a program itself: CANNOT EXIST! This is called the "halting problem".

Exercise: what consequences for us? OK to be philosophical, uncertain.

"Undecidability"

What Does This Have To Do With SMT?

Gödel also proved that number theory is undecidable: if you've got the natural numbers, multiplication, and addition, it is impossible to write an algorithm that answers arbitrary questions about number theory in an always correct way, in finite time.

There are also tricks you'll learn in 1010 that let you say "Well, if I could solve arbitrary questions about number theory, then I could turn the halting problem into a question about number theory!"

There's so much more I'd like to talk about, but this lecture is already pretty disorganized, so I'm not going to plan on saying more today.

CEGIS and Synthesis

CounterExample Guided Inductive Synthesis (CEGIS)

Pseudocode

Many of the Forge expressions shown in this section will be demonstrative only, and won't actually run. You'll see why as we progress, but be warned!

Consider modeling Kruskal or Prim–Jarník's approach to finding a minimum spanning tree on a weighted graph.

I wrote a rough model of Prim's algorithm last year, intending to turn it into a lecture or assignment. It never appeared in that setting, but it will be useful here as a motivational example. We looked at this earlier in the semester; let's return to it.

The model itself has a number of predicates, such as:

wellformedgraph (a well-formedness predicate to force the graphs to be weighted, directed, etc.); and
runPrimComplete (produce a complete execution of Prim's algorithm on the underlying graph)

There are a few questions we might want to ask about MSTs in general. Not all of them involve the algorithm.

Find a minimal spanning tree for a graph, independent of any algorithm model.
Find a counter-example to correctness for Prim's algorithm (i.e., falsify "Prim's always produces a minimal spanning tree).
Find a valid MST that Prim's algorithm cannot produce. All of these questions have a different character, even though they may seem similar.

Question 1: Finding a MST

What does it mean to find a minimum spanning tree for an undirected, weighted graph $G = (V, E, w)$ ? It must be a set of edges $T$ , such that:

$T \subseteq E$ ;
$T$ forms a tree;
$T$ spans $V$ (i.e., $V$ contains at least one edge connected to every vertex in $V$ ); and
for all other sets of edges $T_{2}$ , if $T_{2}$ satisfies the previous 3 criteria, then the total weight of $T_{2}$ must be no less than the total weight of $T$ (i.e., $T$ is a minimal weight spanning tree).

Checking the final criterion requires higher-order universal quantification. We'd need to write something like this:

some t: set Node->Node |
  spanningTree[t]
  all t2: set Node->Node | 
    spanningTree[t2] implies weight[t2] >= weight[t]

Forge can eliminate the outer some quantifier via Skolemization: turn it into a new relation to solve for. But it can't do that for the inner all quantifier. How many possible edge sets are there? If there are 5 possible Node objects, then there are 25 possible edges between those objects, and thus $2^{25} = 33554432$ possible edge sets.

Aside: The exponent will vary depending on the modeling idiom. If you can exclude all self-loops, for example, it will be $20$ . If you are working in a tool that understands undirected edges in a non-relational way, it will be $10$ , etc.

While, technically, Forge probably could produce a big and formula with 33 million children, this approach doesn't scale. So the solver engine won't even try---it will stop running if given such a constraint.

We need a different, more structured way of attacking this problem.

An Alternative Formula

Suppose that, instead of the above shape, we had something like this, with respect to a fixed edge set t:

  // some edge-set "t" already defined at this point
  some t2: set Node->Node | 
    spanningTree[t2] and weight[t2] < weight[t]

That is, suppose we had a prospective candidate solution t, and we want to search for better solution. This is fine: Forge can handle higher-order some. So we can use Forge to check a candidate solution.

The Idea

This suggests an iterative approach. Find a candidate spanning tree---any spanning tree. Call its weight $k$ . Then try to find something better, a spanning tree with length less than $k$ . And then try again. And again. Until nothing better can be found. At that point, you've found a minimal spanning tree.

Since Forge is a Racket library, you can use this technique via loop in Racket. It's a bit less straightforward, since you need to break out of the Forge language itself, and because this use of Forge isn't yet documented well, you'd probably need to ask questions if you needed this technique for your project.

This works for any solver that lets you make iterated queries.

Note that since Z3py is a Python library, you can use this technique in Z3 as well. If you need to backtrack (usually you don't, with this technique), use the pop and push functions in Z3py.

More Complicated Learning

This technique is pretty specialized, though. It relies on:

having a metric for goodness (here, total edge weight); and
a well-defined and easily checkable precondition for candidacy (here, the notion of being a spanning tree).

Not all higher-order universal constraints exhibit these nice properties, and others which aren't higher-order can still benefit from this idea.

Here's a classical example from formal methods: program synthesis. Suppose we were trying to synthesize a program (see this link for the full work) that takes a machine integer as input, and outputs the number of 1 bits in that number. We might express the goal roughly as:

some p: program |  
  all i: Int | 
    p[i] = countBitsInInteger[i] // assume we have this helper

We might proceed as follows:

Generate a candidate program, any candidate program.
Check it by seeing if some i: Int | p[i] != countBitsInInteger[i] is satisfiable.
- If no, we've found a good program.
- If yes, there's an integer i that the current program doesn't work for. instantiate the formula p[i] = countBitsInInteger[i] with the concrete value, add it to our constraints, and repeat. This doesn't rule out a lot of the search space, but it does prevent the exact same problem from arising again in future candidates.

This broad technique is called CounterExample Guided Inductive Synthesis (or CEGIS). It and related ideas are used heavily in synthesis tools. Similar ideas are also used inside SMT solvers to help eliminate universal quantifiers.

Sophisticated versions of CEGIS will try to infer root causes for failure (rather than just learning, essentially, "...yes, but make it work for i, too." This tends to be what makes the technique scale well: the more focused a cause you can learn, the more of the search space you can rule out with each iteration.

Inductive

If you've used proof by induction before, note that the use of "inductive" here is different! Inductive synthesis learns by example; in essence, automating inductive reasoning, which is not the same thing as the inductive principle in mathematics.

For more information, I suggest skimming the paper linked above about synthesizing bit-vector manipulation programs.