A Parastatal Problem

A friend informed me of the existence of Parastat, a new ActivityPub server project with some interesting goals, and we were quite pleased to have seen that we would have a big-name project representing peer production initially¹. After reading the only material published then, What is Parastat?, I had felt much less optimistic than before, and what was being proposed did not appear as radical as we had hoped.

We are concerned with how Parastat would reflect on the new peer production movement, and how it poorly approaches the other problems a peer production programmer should consider, beyond licensing. It has taken me some time to figure out what we would want, and what went wrong, but we believe we can articulate it properly in this essay, and summarise it as:

One begins to see the norms of the corporate world seep into the "free software" or "open source" projects, with their ruthless advertising and competition², and frequently masochist design choices, partly due to recuperation by the corporate world itself. While that world has not tried to infest our movement yet, it is certainly appearing in spirit.

We are being asked, in effect by the developers' announcement, to go back to the low level programming language, to watch them smear other projects, to reinforce hierarchical power structures, and to partake in the development hell, that one would expect from a lousy startup looking to impress a venture capitalist³! And we are to accept this as the start of the peer production software movement‽

Surely not. We shall begin to discuss this problem now.

How to not confuse yourself and others about hardware requirements

We may as well begin with the probably least disagreeable issue with the Parastat program, which is that the authors claim Mastodon is inefficient, based on a claim of some memory usage, from a poorly defined situation:

[Mastodon is] not made for efficiency, so it requires expensive hosting fees to keep an instance going.

Mastodon uses around 2GB just to get started.

"What is Parastat?"

In order of least problematic to most, this claim could be:

true. Starting a Mastodon server requires 2GB of memory, and everyone else has decided to live with it. Ben Lubar, who runs a plain Mastodon server, asserts their server uses only 200MB of memory, which is a magnitude off that figure though⁴.
a figure from a different interpretation of "just to get started".
a figure from an actual Mastodon instance, but not configured to reduce memory usage, and thus not the actual minimum requirement.
completely and utterly bogus.

The first and last possibilities don't need much further explanation, so we will just describe the second and third.

How to create reproducible statistics

To reproduce something close to this figure, we would first need to know what constitutes "just [getting] started". Is this with one user just after the server has started, or with many users after the server has run for some time? Does this include the memory used by the SQL server and the rest of the operating system? Providing such a shocking figure, without an explanation of how it was observed, is a very dirty way to obtain a following. This figure should be explained further, or removed from the announcement.

How to improve throughput by eating memory

Claiming that the minimum resource usage is the average is obviously incorrect. However, we now should ask ourselves if Mastodon was designed for high performance with enough resources⁵, or efficiency. An intuitive relation between resource consumption and performance is that providing more resources increases performance (and efficiency is the ratio of consumption and performance). Some strategies can improve performance if additional resources are present, but do not necessarily decrease the minimum efficiency with fewer resources.

One such way to utilise otherwise free memory is to cache values in memory, allowing them to be used without recomputing them or reading them off a relatively slow disk or network. The Linux kernel used to be too up-front about caching, and would report memory used for caching as "used", even though it could be released to programs very quickly. This caused people to think that the kernel was wasting their memory, and someone else to write a website describing the situation.

The ZFS file system also caches other data to improve performance, with similar effects at first sight; but still works (with reduced performance) with less memory available for caching.

Linux is borrowing unused memory for disk caching. This makes it looks like you are low on memory, but you are not! Everything is fine!

https://linuxatemyram.com

ZFS uses different layers of disk cache to speed up read and write operations. Ideally, all data should be stored in RAM, but that is usually too expensive. Therefore, data is automatically cached in a hierarchy to optimize performance versus cost.

The ZFS Wikipedia page

This telling of the story should convince you that the kernel was not wasting memory, and that it was a simple misunderstanding caused by confusing terminology⁶.

With this in mind, we may now ask if the Mastodon instance was caching some media with the 2GB figure, and thus running "faster" by having to read from the disk less often. Thus, we cannot decide that Mastodon is inefficient solely from this memory usage figure.

In contrast to this 2GB memory figure, the Parastat developers believe they can support 1,000 users (which is also poorly defined; are they all active at the same time, or are none online, etc) with 64 megabytes of memory; about a third of the memory used by Ben's one-user server. This is roughly a 3000-fold increase in user/memory density⁷; is it possible to do so without disposing of performance techniques like caching?

This 2GB memory figure is unlikely to be reproduced, and it misrepresents the minimum a Mastodon server could work with, which is an inappropriate way to gain support for a project. It also may show the developers are not entirely familiar with optimisation techniques such as caching; nonetheless, we want the developers to explain this figure better.

How to make promises

It is also very difficult to write a C program that could handle untrusted inputs (such as API queries over the Internet), with acceptable safety and uptime. The notion that a C program must be fast (and a program in a dynamic language slow), and thus writing complex programs in C will be a net gain, is also harmful, and furthermore treats programmer time as almost worthless.

How to pick appropriate technologies

Much of the content we have seen about Parastat has been about the user-facing design, but we believe programmer- and operator-facing design may be neglected from using the C programming language.

It is evident that writing a stable C program is quite difficult without much testing and deliberation. Stability, of course, relates to the hosting costs that the developers wish to minimise, as a more stable server does not require its operators to take time restarting and checking that it works. Take John Rose's experience with Sun's server software, as an example of C servers being difficult to keep running:

There are some network things with truly stupendous-sized data segments. Moreover, they grow over time, eventually taking over the entire swap volume, I suppose. So you can't leave a Sun up for very long. That's why I'm glad Suns are easy to boot!

John Rose to sun-users, from The Unix Hater's Handbook

We would be pleasantly surprised if the Parastat developers were able to achieve better uptime than the small professional programmer group at Sun⁸. Of course, many projects written in C are fairly safe and stable, such as the Apache web server⁹, but without many person-hours spent reviewing the code and analysing problems like John's memory leaks, it is unlikely that Parastat will achieve excellent stability. To give some sense of scale of how much review should be done, Ravenbrook reviewed the Memory Pool System, a well-deployed project that is safe and is written in C, at a rate less than 10 lines/minute by four to six people¹⁰, on top of any other cursory analysis done while developing the product.

John has his own opinion on how the software could have been greatly improved:

But why should a network server grow over time? You've got to realize that the Sun software dynamically allocates very complex data structures. You are supposed to call "free" on every structure you have allocated, but it’s understandable that a little garbage escapes now and then because of programmer oversight. Or programmer apathy. So eventually the swap volume fills up! This leads me to daydream about a workstation architecture optimized for the creation and manipulation of large, complex, interconnected data structures, and some magic means of freeing storage without programmer intervention. Such a workstation could stay up for days, reclaiming its own garbage, without need for costly booting operations.

That doesn't sound very much like C, but it does sound like what we would want, in order to keep the system running.

It will also be tricky to ensure all errors are handled appropriately (or at least that the server shuts down cleanly) in C, due to the non-presence of a exception or condition system. Recovering in a concurrent situation is still tricky, and how to approach it best is not immediately obvious, but C does not do any favours there.

There is also a desire to produce very readable code by the community:

i take a literate stance; i believe source files should be readable by people who aren't programmers (which DOES mean repeating yourself); but that's also a LOT of work

A Parastat Discord participant

I take your stance, but usually from the perspective of "I'm gonna forget the mental state I wrote this in and I wanna preserve that and I don't want to have to read the code to understand what it does when I inevitably forget"

The frontend developer

This is quite achievable using Elm¹¹, but less so using C, which by nature of being a low level language, "requires attention to the irrelevant", severely limiting readability. Having fewer means of abstraction¹² also complicates understanding the behaviour of some code at a glance, further complicating matters. It seems very unlikely that casual programmers would be able to begin analysing and modifying Parastat server code quickly, let alone have non-programmers be able to read the code.

How to not get pwned

Of course, crashing is a fairly acceptable thing to do when things go wrong. Being able to construct a remote code execution exploit from a memory bug is much more dangerous, and the effects could be much less obvious without further investigation. The Chromium project reports 70% of security bugs are due to memory safety problems, and suggests that the most thorough way of avoiding those bugs is to avoid using unsafe languages like C++ or C. The Parastat developers are doing exactly the opposite of this.

While not a security vulnerability per se, the claim that the developers want to produce a new, more private, protocol and maintain backwards compatibility for the less private ActivityPub, will be difficult to fulfil. Federating with an ActivityPub server would then imply that the other server is not going to "leak" user information in any way, and transitively, servers federating with those may also have to be as private. It is thus unlikely that a server can properly benefit from the new, more private, protocol and use the old protocol simultaneously.

How to report progress

Most of my concerns could be addressed with a source code release, as well as some more; we would be able to see for ourselves that we can indeed host 1,000 users in 64MB of memory, and that the server is stable and performant. However, we have been told the code isn't ready to be read, and that it is in our interest to wait until it is.

It would be preferable to release messy code, as then we can check if these claims are true, and experienced programmers can even possibly help clean up the code, leading to a readable product sooner. At the very least, we want to know if any of what we've been told about the project is true.

How to improve moderation

When the people are being beaten with a stick, they are not much happier if it is called "the People's Stick".

Mikhail Bakunin

When the people are being beaten with a stick, they are not much happier if it is a particuarly efficient stick, that allows many people to be beaten at once.

Probably not Mikhail Bakunin

The main issues with adminstrators-doing-administration, are that they have power over normal users, which may not be deserved by merely being, or knowing, the server operators, and that it invokes skepticism in some people; though this is difficult to overcome without a more distributed or redundant architecture. It may, instead, be more appropriate to discuss what we should do when administrators are not actively helping, or how we can alleviate the work administrators do.

While it may be effective to use an appropriate code of conduct and license to deter bad actors locally, we are not conviced Parastat can effectively deal with external actors and actors that have a very subjective status. Providing administrators more effective tools is a kind of micro-optimisation that ignores the potential of a more community-driven approach Furthermore, while the developers are very enthusiastic about ensuring everyone is safe on their platform, another group of operators may not be as enthusiastic, and may not mediate and/or intervene as frequently, while still not violating the ethical constraints set by the Non-Violent Public License.

The "joke" in the title is from the definition of parastatal:

parastatal ˌpærəˈsteɪt(ə)l adj: partly or completely controlled or owned by the government

Macmillan Dictionary

I doubt the person who chose the name Parastat thought of the adjective, but it was still funny when I realised I would want to write about moderation techniques.

How to moderate even fasterer

The other issue is that we have to go through administrators to get anything done. In the worst case scenario, the administrators don't act upon any reports they receive, possibly due to intoxication by "free speech" ideology, or by just being busy with other things. Typically, there is some latency between someone notifying a administrator and an adminstrator acting upon it, in the order of hours. This, again, also "costs" people, as they have to take some time to process all the reports; moreso when they have to wade through some unpleasant content.

With many more normal users than administrators, it should be possible to leverage the flags produced by users with software that tries to filter out content which it believes a given user would probably flag. This technique already exists, and is called collaborative filtering. While it is usually used in unfortunate places such as targeted advertising, the designs produced should be more than enough to effectively automate most of the role of an administrator.

Conclusion

As programmers engaging in peer production, we are concerned with how to act in a cooperative manner, with users and programmers alike. However, we have not seen the Parastat developers act cooperatively, by making large claims that are not backed by code demonstrating their possibility, or explanations that provide important context to them.

Some of the goals of the Parastat developers are quite laudable, such as emphasis on ensuring sustainable interactions between users, and attempting to improve on the efficiency of a server; but they manifest in undesirable ways, like using an unsafe low-level language, and reinstating a form of digital feudalism between server administrators and users. This hurts users not only immediately, but sets poor precedent for developers in the future, whom will then have to transcend the limiting structures set by the Parastat developers again, just as how the Parastat developers have imagined they have transcended the standards of other federated system developers.

Footnotes:

"If you wanted a project, why didn't you start one yourself?" I have been working on a project that I think represents my view of peer production and its role in forming a liberatory technology, but that has been in development for two years, and the announcement of the project did not attract much attention. The developers of Parastat have much more social capital as well, due to some of their previous work.

You may rightfully ask if I am being uncooperative, and I cannot disagree. My excuse is that I can't tell if the developers are acting in good faith (as described later), and so I don't know if I should bother to reason with them directly.

They state they "don't just want to make a distributed version of Twitter", but they appear to be closer to just that, than any other project!

⁴

I wanted to test this for myself, but the installation instructions for Mastodon were not very easy to follow, as I have not installed many Ruby and/or Node.js programs before, and the instructions provided package names for an older release of Ubuntu that I could not find in the latest release. Coupled with a slower than usual virtual machine, this deterred me from testing for myself, and I will have to believe Ben for now. Make no mistake, that this essay is not a recommendation for any other ActivityPub-using software, but the Parastat developers' attempt to put Mastodon down is quite inappropriate.

⁵

For reference, a virtual machine with 2GB of memory can be rented for about US$20 a year on sites such as lowendbox, and so it is unlikely that if one has a server now, that it has less than 2GB of memory; and so a developer of a server may safely assume they have about as much memory to toy around with.

⁶

When using the free command, the available value is the amount of memory that is available for other programs to use, which is greater or equal to the free value.

⁷

It is likely that there is a large initial value for memory usage, and then only a small increase in usage for each user, so it would not be very accurate to compute user count over memory usage values.

⁸

A programmer being a "professional" does not make them a good programmer by default, but they are often held to high standards.

⁹

Or not! Apache has had about fifty vulnerabilities per year, but many are due to unrelated programming errors and not escaping code sent to clients properly. Still, many vulnerabilities could have been avoided using a safe language.

¹⁰

See The Memory Pool System: Thirty person-years of memory management development goes Open Source

¹¹

Elm is notable here, for being a language with ML-like syntax and semantics and compiles to JavaScript, making it a convenient functional web frontend language.

¹²

No doubt that at least one C weenie would like to interject now and tell me that C is very readable at a small level, and with no magic hiding what is going on, it is trivial to tell what a random snippet is doing. I can only say that a random snippet probably isn't doing very much at all, and it's probably not what a casual reader is interested in.