In this month’s issue:
Maximise your lack of commitment with E T Jaynes in the white stuff.
The dunghill considers the perfect conditions for a market built on pseudoscience.
How the idea of AI task environments can be used for feasibility studies in semi-supervised.
Plus a response from Eppo to last month’s dunghill, and frustrations over badly behaved LLMs.
Semi-supervised
When the LLM bubble goes pop, will we all be rushing back to Russell and Norvig? I like to think so. Beautifully written, terse but clear, effortlessly multi-disciplinary, and even finding space for historical notes at the end of each chapter, it’s my favourite AI textbook by some distance. Since AI proper is about building autonomous agents that need to cope with a wide range of environments, the book is also, indirectly, a compendium of problem-solving techniques. All we need do is substitute ourselves for the autonomous agent.
In this vein, I have repurposed Russell and Norvig’s classification of AI task environments as a method for conducting feasibility studies. The result is a tool that tells me quickly just how difficult a project is going to be, and gives me a language to explain the difficulties.
I will explain: For Russell and Norvig, task environments are the specific worlds in which the autonomous agents must operate to solve problems—in a sense, they are the problem. For the robot vacuum cleaner, the task environment is a floor interrupted by objects; for the AI chess program, it is the board, the pieces, the rules, and so on. Russell and Norvig categorise the possible environments using a series of dichotomies. Is the environment…
1. Fully observable or partially observable?
2. Single agent or multi agent?
3. Competitive or co-operative?
4. Deterministic or stochastic?
5. Episodic or sequential?
6. Static or dynamic?
7. Discrete or continuous?1
Since the option on the right-hand side is nearly always the more difficult one, this has become my feasibility checklist for any problem environment I happen to be given. If everything is on the right, we’re in trouble.
Take the apparently simple example of a sales forecast. Here the environment is a marketplace, filled with buyers and sellers, and impacted by all manner of internal and external forces. It is undoubtedly a partially observable environment (we cannot know every relevant fact, unlike, say, a game of chess where we can see the full board and the location of every piece). It is also a competitive, multi-agent environment: if the business acts as a result of the forecast then a competitor may react, invalidating the forecast, unless, that is, we build in their expected actions. On top of this, it is highly stochastic; it happens in continuous time; it is sequential (what happens next, is dependent on actions taken now); and it is dynamic (the situation can change while we are deliberating). This is why forecasting markets (as opposed to, say, some physical systems) is so hard, and why we don’t even attempt perfection.
What to do when faced with right-side-heavy environments? First, massively lower expectations (which is what we do with forecasting). Second, if possible, approach the problem by making some simplifying assumptions (moving from the right side to the left). Can we ignore the competitors for now? Can we treat the environment as static even if, strictly speaking, it is not? Can we approximate the situation using discrete time? But in each case track the implications of the simplification, or it may come back to bite you.
By the way, as human beings, we operate for the most part in partially observable, multi-agent, stochastic, sequential, dynamic, continuous environments, which is why we are so damn great and not about to be replaced.
Please do send me your questions and work dilemmas. You can DM me on substack or email me at simon@coppelia.io.
The white stuff
E T Jaynes is a cult figure among probabilists. He wrote clearly and inspiringly about the deepest aspects of probability; he crossed disciplines, invented new ways of looking at the world, and quietly got on with the business of being a genius.
This month I’ve been reading his seminal 1957 paper Information Theory and Statistical Mechanics, where he introduces for the first time his principle of maximum entropy. The principle uses Shannon’s concept of information entropy - at the time only recently formulated - to update Laplace’s principle of insufficient reason (in the absence of any relevant information, we should assign equal probabilities to all possible outcomes). Jaynes redefines this in terms of entropy: in the absence of information, it is logical to assume the probability distribution that is maximally non-committal - in other words, the one that contains the least information, ergo the one that has the highest entropy. Philosophically, this puts this approach to probability on a firmer foundation since it removes “the apparent arbitrariness of the principle of insufficient reason, and, in addition, it shows precisely how this principle is to be modified in case there are reasons for ‘thinking otherwise.’“ But it is also a whole new way to build a probability distribution and as such unlocks a whole new line of problem-solving techniques. (I am using it in a project right now.) If you are still not satisfied, it links (as Jaynes shows in the paper), the statistical concept of entropy to the thermodynamic one.
If the paper gets you hooked then Jaynes’ classic, posthumously-published masterpiece - like I said, a cult figure - Probability Theory: The Logic of Science is available here.
The dunghill
When I’m asked by clients for my opinion on synthetic respondents, it’s a bit awkward. On the one hand, I feel strongly that most commercial implementations are going to be a joke. On the other, I can’t rule out the possibility that, despite this, they will drive a burgeoning market in which millions are made. This in turn made me think about the necessary conditions for a market built on pseudoscience. These conditions will have to explain the markets we have seen so far - digital attribution modelling, bad (i.e. most of) market mix modelling, as well as the ones just on the horizon (synthetic respondents, synthetic sample size boosting). Here’s where I’ve got to. As ever, I’m interested in your views.
Necessary conditions for a market built on pseudoscience
First, someone needs to benefit. A direct benefit to the business is impossible since the thing doesn’t work. But there are other ways: someone’s reputation can be enhanced by the work - they can catch some of the reflected glory of the latest technological advance. Or a project, a campaign, a controversial business decision can be retrospectively justified.
Next, any checking of the results needs to be impossible, or at least highly disincentivised. Under ideal conditions, the claims will be truly unfalsifiable (checking off Karl Popper’s classic definition of a pseudoscience). But it might just be that a check is expensive, and then who wants to spend money on potentially undermining a favourable result (see the next point)? Contrast this with applications of statistics and data science such as prediction or network optimisation where just using the method is a check on its performance.
A corollary of (2) is that with enough tweaking and twisting (see QRPs) the given technique can be made to say whatever the client wants it to say. Such a feature will further activate the market since it produces an endless stream of satisfied customers.
The technique employed must look like real science, else why would it be taken seriously. The surest way to do this is by borrowing a technique that works in its own limited domain and then aggressively misapplying it.
To be convincing the market must be run by true believers. For this to happen the details of the technique must be sufficiently complex to fool the practitioners into thinking that it is the real deal. Ideally there should be debates, conferences, controversies, rival theories. Ultimately it should generate its own history - who could doubt it after that?
Finally the average buyer needs to be incapable of calling bullshit. In the best case scenario, they are scientifically and mathematically illiterate, so that, even if they wanted to (and often they do not - see point 3) they could not question the result.
I think synthetic respondents - in all but the most cautious use cases - scores a solid six here - which makes me feel better about simultaneously slamming it and predicting huge growth.
If you have some particularly noxious bullshit that you would like to share then I’d love to hear from you. DM me on substack or email me at simon@coppelia.io.
From Coppelia
After last month’s dunghill querying the approach to relative-lift AB testing taken by Eppo, I got an email from their head of statistics. They had accepted the main part of my criticism and put in a fix. Now I can't emphasise enough how rare this is. Usually the response from a business to this kind of criticism is silence, or aggressive technobabble. But this was gracious, grown-up behaviour, and deserves respect!
This month I once again attempted to embed an LLM within a process; once again the result was disappointing. To be clear I'm not talking about the use of LLM as a sidekick - in coding, data manipulation, research, problem-solving. There the results are, I’ll admit, phenomenal. The point of failure for me (and for many others - I don't think I'm being particularly controversial here) has been when the little ones are left to play by themselves. Then all hell breaks loose. Yes, I know about all the clever things you can do with prompts and such like, but I reach the point where saying nice things to them just gets frustrating - perhaps even a touch humiliating - and I return to the old ways.
If you’ve enjoyed this newsletter or have any other feedback, please leave a comment.
I have left out the known/unknown dichotomy from Russell and Norvig’s list since it is not particularly useful for assessing feasibility.