Programming Language Design

What ought to be considered when designing a programming language? While it can be fun to make a language focused on a single aspect for a side project with a single user, language designers need to consider a vast array of things in order to create a language with wide appeal. A new language must bring something new to the table in order to compete, while at the same time providing the niceties expected of an established language.

What follows is a list of things to consider, which aims to be more or less exhaustive. Users will typically decide whether or not to use a language based on one or more of these considerations. This is not to say that its pointless to make a language without considering all these aspects. Language design can be a challenge and satisfying for its own sake.

Basis

What is the core reason for this language to exist and be used?

Possible reasons include:

Clearly, not every language needs all of these reasons to exist, and some reasons contradict others. The point is that a new language should have at least one.

Semantics

Syntax

Embedded Domains

The core syntax of a language is usually composed of distinct clusters of syntax each handling a different subsystem in the language. Most of these can be omitted (as evidenced by lisp).

Common examples:

What to include is a tradeoff of complexity and readability, and also has implications for the "intended usecase" of the language.

Value Embedded DSLs

Libraries usually define their own 'language' with which to discuss their domain, but do so in a way that only makes use of the general syntactic constructs for values, functions, types, etc. The DSLs discussed here are not of that nature. Of interest here are embedded DSLs which impose additional syntactical constraints outside of the language's typical mode of parsing. Usually these DSLs are encoded as strings and parsed at runtime.

For example consider the difference between printf's format specifiers and python's f-strings. Because f-strings are a distinct literal form, they can be statically parsed and proved to be syntactically valid. Printf's format string on the other hand potentially hides structural unsound-ness until runtime.

More examples:

Value embedded DSLs are a tradeoff between allowing more complete compile-time checks but adding complexity to the core grammar vs delegating to libraries but hiding code structure in values (usually strings). Libraries that implement a value embedded DSL must duplicate the parsing, validation, and error reporting machinery that already exist in the core language. If the core language does not expose this machinery to libraries then handling will be inconsistent.

External Data

External Code

Project Structure

Performance

Tooling

Standard Library

Documentation

Community, Accessibility

Implementation

A well designed and specified language should be independent of its implementation. Multiple implementations may exist. Therefore this section is concerned not with language design per se, but resources and considerations for the implementation of compilers, interpreters and other tooling.

Parsing

More Ideas for New Languages

Things I think would be interesting to have in a new language. Features big and small to serve as inspiration.

Other Resources

Other Thoughts

compunomicon

I've been obsessing lately over the idea that there's no good reason why there is more than a single implementation of sort present on a single computer. Why is it that we need sort implemented in C, python, javascript, etc. each with a (potentially) different and arbitrary choice of algorithm?

I'm imagining a hyper-monolith that is compiler, operating system, scripting language, browser, and javascript all in one. It would run on desktops, tablets, super-computers, smartphones, and micro-controllers. It's meta-algorithms would select the appropriate way to sort given what it knows about its resource constraints, and the data it is sorting.

The data it collects on what is sorted, how often and how much, would be compiled. It would know its own strengths and weaknesses. This data can be used tune the meta-algorithm. Perhaps that tuning would inform the next generation of chip design. The monolith can compile real world performance analytics into verilog, to produce a chip that will perform faster.

The monolith generates its own documentation from its own source with the right flag.

I believe all the pieces exist, in one form or another. The only question is if they all fit together in a useful shape.

computer literacy

the day everyone has a basic understanding of computing we will see a fundamental shift in our society similar to the shift seen in preliterate to literate societies. It think it may be useful to develop a lingua franca for this situation while we have the opportunity.

:x fractal

the same ideas recur, with varying levels of sophistication, in widely different contexts.

IPC is just networking.

ISAs are DSLs

everything turing complete is equivalent.

The browser is a desktop environment with better ergonomics for how it installs programs (websites are turing complete after all)

Caching happens in the processor (from memory), in memory (from the disk), in the disk (from the network) and in the CDN (from the site). If a program can be programmatically re-installed or updated when needed, how is that different than a cached web resource? Even a compiled binary could be considered a cache of the source (again if it could be dynamically recompiled when needed).

compilation and compression are intimately linked.

:x inbox

finish treerat maybe?