The conformance test suite is currently mostly focused on “what does an explicit type annotation mean”
A shared spec for this is important because if you write a Python library, you don’t want to have to write a different set of types for each Python type checker
Here are some things the spec has nothing to say about:
Inference
You don’t want to annotate every expression in your program. Type checkers have a lot of leeway here and this makes a huge difference to what it feels like to use a type checker.
For instance, mypy will complain about the following, but pyright will not (because it infers the types of unannotated collections as having Any):
x = []
x.append(1)
x[0] + "oops"
The spec has nothing to say about this.Diagnostics
The spec has very little to say about what a type checker should do with all the information it has. Should it complain about unreachable code? Should it complain if you did `if foo` instead of `if foo()` because it’s always true? The line between type checker and linter is murky. Decisions here have nothing to do with “what does this annotation mean”, so are mostly out of scope from the spec.
Configuration
This makes a huge difference when adapting huge codebases to different levels of type checking. Also the defaults really matter, which can be tricky when Python type checkers serve so many different audiences.
Other things the spec doesn’t say anything about: error messages quality, editor integration, speed, long tail of UX issues, implementation of new type system features, plugins, type system extensions or special casing
And then of course there are things we would like to spec but haven’t yet!