What's special about transformers is they allow each element in a sequence to decide which parts of data are most important to it from each other element in the sequence, then extract those out and compute on them. The big theoretical advantage over RNNs (which were used for sequences prior to transformers), is that transformers support this in a lossless way, as each element has full access to all the information in every other element in the sequence (or at least all the ones that occurred before it in time sequences). RNNs and "linear transformers" on the other hand compress past values, so generally the last element of a long sequence will not have access to all the information in the first element of the sequence (unless the RNN internal state was really really big so it didn't need to discard any information).

https://news.ycombinator.com/item?id=38859976

There's a line between the discovered stuff and the invented stuff. I always love to see it spelled out in different ways.

https://news.ycombinator.com/item?id=38863339

Dear Reader. I am inviting you to spend many pages with me. Before deciding whether to accept my invitation, you may want to know who I am. I was educated as a mathematician; my doctoral thesis was on partial differential equations. While a student, I worked part-time and summers as a programmer.

– Leslie Lamport, introduction to a book, https://lamport.azurewebsites.net/tla/science.pdf

If you’re building complex, expensive systems, my workshops on software modeling can help you save hundreds of thousands in saved developer time and maintenance work. With my training you’ll learn how to

  • Catch complex bugs that would take weeks or months to fix, and fix them before you start writing code.
  • Build complicated systems quickly and with confidence.
  • Better understand, document, and modify existing legacy systems.

(...)

I specifically offer workshops in TLA+ and Alloy.

https://hillelwayne.com/consulting/

Amazon modeled DynamoDB and S3, finding complex bugs in both. One was so complicated that “the shortest error trace exhibiting the bug contained 35 high level steps”. They had slipped past all their tests, QA, and code review, and would have lost customer data if it reached production. They also confidently made several performance optimizations and were able to test they didn’t lead to problems later. One engineer later estimated that software modeling cut two months off a four-month schedule.

https://hillelwayne.com/consulting/

Scientists should be used to reading math. You may find the math hard if you’re an engineer. But unless miseducation has burdened you with an insurmountable fear of mathematics, I encourage you to give the book a try. Learning the math will improve your thinking.

https://lamport.azurewebsites.net/tla/science.pdf

The book assumes only that you know the math one learns before entering a university.

https://lamport.azurewebsites.net/tla/science.pdf

I started writing concurrent algorithms around 1973, and I quickly learned that they were hard to get right. The many possible orders in which the operations of different processes can be executed leads to an enormous number of possible executions that have to be considered. The only way to ensure that the algorithm worked correctly was to prove that it did.

https://lamport.azurewebsites.net/tla/science.pdf

I have spent most of my career as a member of industrial research labs. The computer science I have done has been motivated by the problems facing system builders—sometimes before they were aware of those problems. I have devoted the last part of my career to developing tools to help them — both intellectual tools to help them think better and programs to help them detect errors before they are implemented in code. These tools are based on what I learned by writing and reasoning about concurrent algorithms.

https://lamport.azurewebsites.net/tla/science.pdf

By the time I designed TLA+, model checking had become a practical tool for checking the correctness of abstract programs. A model checker can essentially check correctness of all possible executions of a very small instance of an abstract program. This turns out to be very effective at detecting errors.

https://lamport.azurewebsites.net/tla/science.pdf

A concurrent program that has run correctly for years can start producing frequent errors because a small change to the computer hardware, the operating system, or even the other programs running at the same time causes incorrect executions that have never occurred before. The only way to prevent this is to ensure that every possible execution satisfies the behavioral properties.

https://lamport.azurewebsites.net/tla/science.pdf

A method of writing proofs is proposed that makes it much harder to prove things that are not true. The method, based on hierarchical structuring, is simple and practical.

(...)

The proofs in Newton’s Principia differ in style from those of a modern textbook only by being written in Latin. Proofs are still written like essays, in a stilted form of ordinary prose.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

Anecdotal evidence suggests that as many as a third of all papers
published in mathematical journals contain mistakes — not just minor errors, but incorrect theorems and proofs.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

The expression (...) [in mathematical notation] is quite long, but it is easy to read because of its structure.

The same principles that make formulas easier to understand can make proofs easier to understand: proof steps should be referred to by name, and the structure of the proof should be manifest.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

The ideal tool for reading a structured proof would be a computer-based
hypertext system. It would allow the reader to concentrate on a particular level in the structure, suppressing lower-level details. In a printed version, one can ignore lower-level details only by skipping over that part of the text. While this is not ideal, the structure is displayed by the format, making such skipping fairly easy—certainly much easier than in a prose-style proof, where the format provides little clue to the logical structure.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

After the statement of the theorem comes a Proof Sketch, which is an informal explanation of the following proof. The proof sketch serves as a “road map” to the proof, helping the reader understand intuitively why the proof works.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

Some twenty years ago, I decided to write a proof of the Schroeder-Bernstein theorem for an introductory mathematics class. The simplest proof I could find was in Kelley’s classic general topology text [4, page 28]. Since Kelley was writing for a more sophisticated audience, I had to add a great deal of explanation to his half-page proof. I had written five pages when I realized that Kelley’s proof was wrong. Recently, I wanted to illustrate a lecture on my proof style with a convincing incorrect proof, so I turned to Kelley. I could find nothing wrong with his proof; it seemed obviously correct! Reading and rereading the proof convinced me that either my memory had failed, or else I was very stupid twenty years ago. Still, Kelley’s proof was short and would serve as a nice example, so I started rewriting it as a structured proof. Within minutes, I rediscovered the error.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

He had already written conventional proofs—proofs that were good enough to convince us and, presumably, the referees. Rewriting the proofs in a structured style, we discovered that almost every one had serious mistakes, though the theorems were correct.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

Time and again, we would make a conjecture and write a proof sketch on the blackboard — a sketch that could easily have been turned into a convincing conventional proof — only to discover, by trying to write
a structured proof, that the conjecture was false. Since then, I have never believed a result without a careful, structured proof. My skepticism has helped avoid numerous errors.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

In a properly written proof, where every use of an assumption or a proof step is explicit, simple text searching reveals exactly where every hypothesis is used.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

One must be a skeptical reader of one’s own proofs. My own rule of thumb is to expand the proof until the lowest level statements are obvious, and then continue for one more level. This takes discipline. But, unlike conventional proofs, in which adding more detail can make a proof more confusing, structured proofs accommodate as much detail as desired.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

Avoiding mistakes when proving theorems requires careful, de-
tailed proofs. When first shown a detailed, structured proof, most mathe-
maticians react: “I don’t want to read all those details; I want to read only the general outline and perhaps some of the more interesting parts.” My response is that this is precisely why they want to read a hierarchically structured proof. The high-level structure provides the general outline; readers can look at as much or as little of the lower-level detail as they want.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

The best way to read a structured proof is level by level—first reading the high-level steps 〈1〉1, 〈1〉2, 〈1〉3, . . . , then the proofs of those steps, and so on. However, having to skip over the lower-level steps makes reading the high-level ones inconvenient. With hypertext, this is not a problem.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

Computer scientists are more willing to explore unconventional proof
styles. Unfortunately, I have found that few of them care whether they
have published incorrect results. They often seem glad that an error was
not caught by the referees, since that would have meant one fewer publica-
tion. I fear that few computer scientists will be motivated to use a proof
style that is likely to reveal their mistakes. Structured proofs are unlikely to be widely used in computer science until publishing incorrect results is considered embarrassing rather than normal.

https://lamport.azurewebsites.net/pubs/lamport-how-to-write.pdf

Most software flaws come from one of two places. When the code doesn't match our expectations, it could be that the code is wrong. Most software correctness techniques - types, tests, etc - are used to check the code. But it could instead be that the code is correct and our expectations are wrong: there's a fundamental error in our design. These errors, called specification errors, are some of the most subtle and dangerous bugs. They can span multiple independent programs, occur in convoluted race conditions, or depend on physical phenomena. Our regular tools simply can't find them.

https://www.dabeaz.com/tla.html

For example, if you don't know how to copy a word with just keyboard shortcuts in Emacs, the Edit menu's Copy, Cut, and Paste selections provide the path of least resistance. There's no reason to punish yourself for choosing Emacs. Use its menus, use your mouse to select regions and click in-buffer buttons, and don't let unfamiliarity stand in your way of being productive.

https://opensource.com/article/20/3/getting-started-emacs

One advantage to knowing Emacs keyboard shortcuts is that many of them also apply in Bash:

  • C-a—go to the beginning of a line
  • C-e—go to the end of a line

https://opensource.com/article/20/3/getting-started-emacs