mat~

pls have virtual coffee uwu *gib*


Web Protocol Idea

Fri Nov 27 10:41:31 AM UTC 2020

Introduction

I have been reading and exploring the WWW alot since recently, and learning Erlang gave me ideas to write server/client applications as soon as I am done with the learning phase of Erlang. It's been quite a while I thought about this project so I thought I should write a blog to speak out all my design ideas.

I will try to not get too ambitious and limit myself to what I can probably do. The point of this project is to learn more about network (especially TCP) protocols by making my own like I have learned how CPUs and VMs works with the VICERA project.

My goal is to make this protocol simple, as well to understand as to parse by any programming language (as a reference of limitation, shell script. It is an extreme limitation but will help me making something good out of that).

Everthing that I will say here are just design ideas that I wanted to speak out. I didn't plan at all to make this project yet and these design ideas may probably never become a real thing.

This blog will go through 3 parts, the first one will be all about the protocol itself and the second one will be all about the language to format pages. The last will be about some standards I thought about and also about the possible implementations of this protocol.

Have a good read!

Web Protocol Design

KISS: This protocol will be dead simple to parse as well as dead simple to read. The header will stand in only one line (or maybe two, will see later as the design is not completely clear yet). The current format is

-- Request --
    PROTOCOL HOSTNAME URL KEYWORDS...
    [ content... ]
-- Response --
    PROTOCOL/VERSION STATUS HOSTNAME URL KEYWORDS...
    [ content... ]

PROTOCOL/VERSION will describe the name of the protocol and the version. This can help in case the browser is outdated and do not support the latest protocol.

STATUS is like in HTTP, an integer defining the current status. However, to simplify everything, the status will not be followed with a message like in HTTP like this: 200 OK or 502 Bad Gateway. I will also try and reduce the number of error codes as much as I can.

To provide a convenient way to include multiple subdomains without having to have a different IP address on each subdomain (or using some proxy software or whatever), I have included HOSTNAME as it will specify the hostname that the client would like to request to.

URL is like on HTTP, the path to the file you want to request.

KEYWORDS... are specifications of the request, letting the client know what are they currently having. See the example below.

For example, the client requests a page at test.h3liu.ml/test.bin. The server will have to specify to the client that test.bin is not a webpage but a compressed octet stream. The request response will then look like this:

NOTE In this example we will assume that the status code equivalent to HTTP's "OK" is 200.

PROTOCOL/1.0 200 test.h3liu.ml /test.bin binary compressed
[ Yum data... ]

For now I only thought about a few keywords, which are

binary  page    text    compressed  api post    get
tui

All unknown keywords must be ignored. If keywords don't provide enough information about the request/response, abort.

API system: HTTP APIs are widely used and I find it very convenient to make all kind of online applications. First off, I am getting rid of this kind of ugly format: /url/page?blah=blahblah. All the data will be specified inside the request using plain text or using the Markup language designed for this protocol (We will talk about it later).

For instance, let's say we have made an awesome search engine using this new protocol and one person could request a search query like this:

PROTOCOL awesomesearch.com /search api get page
[[ "Cheap VPS" ] search ] body

And then the search engine returns what you wanted to have

PROTOCOL/1.0 awesomesearch.com /search api page
[ 
    [ Result 1 data... ] result
    [ Result 2 data... ] result
    ...
] body

By the way, api specifies it's an API and page specifies the markup language. You could for instance make a plain text API and specify it like that

PROTOCOL/1.0 awesomesearch.com /search api text

Result 1 data
Result 2 data
...

Pages What would be a web protocol without any web pages? Nothing much. For pages, like the API, we provide a markup language, stylesheet and simple preprocessor using one language. Like in HTML, a page will contain a body and a head, the head will contain meta data and the body will contain the content ready to be displayed.

Here is an example response for a web page:

PROTOCOL/1.0 h3liu.ml /index page
[
    [ "Welcome!" ] title
    [ "Welcome to my website" ] description
] head
[
    [ "Hiya!" ] big-text
    [ "Welcome to my awesome page" ] paragraph

    ( Example of a preprocessor )
    [ [ "This browser is TUI" ] paragraph ] if-tui
] body

The preprocessor will be simple and turing-incomplete, allowing to do some basic conditional stuff and such. Client-side scripting is not going to be a thing, so no JavaScript or client-side scripting. A client could still implement client-side scripting out the language as it is totally possible to do so but will result to be a non-standard client.

I won't prevent anyone from non-standard use of the protocol or the language as people do whatever they want out of it. It is just not guaranteed that one page will work on every client.

File transfer There are keywords for that: image and binary (for now). To reduce download time, the keyword compressed will probably be a thing. This keyword tells the client that the current response is gzip-compressed.

Language Design

As I have said earlier, the protocol comes with a language that works as API data structure, markup language, stylesheet and preprocessor.

You probably have seen earlier that the preprocessor looks like a stack-based language due to it's reverse polish notation. Well this early design of the language will be inspired of Forth and Adobe's PostScript. Everything will be stack-based as I find a convienient use for it in this case.

My goal is to make a language that can fill the purpose in one language the purpose of 3 different languages in HTTP (PHP, HTML and CSS).

At the first place I thought about making it a Lisp dialect but eh, turns out it's better as a stack-based Forth-like language (Everything is seperated by blanks!).

The language has different data types: Datapacks, words, numbers and strings.

Datapacks are between [ and ], I took this from the FALSE esoteric programming language. It can contain data as well it can contain code (like a lambda). Then, a word could just pop a datapack from the stack to get what it needs.

Example : [ "Hello" [ ", World!" ] bold ] body

Words are like in Forth, definitions. There will have a few primitives that will be included in the standard but rest will be defined using the language.

Words can contain any type of data except words. Which can become handy for repetitive tasks. For instance, let's say I have to make a site that says "Hello, (name)" for multiple names, I would do

[ "Hello, " pop cr ] "hello-name" define

[
    "John" hello-name
    "Matthilde" hello-name
    ...
] body

pop is one of the primitives, allowing you to pop something from the stack inside a datapack. This can become handy to pass arguments inside user-defined words.

Numbers are going to be handy for a few preprocessor words and stylesheet. You can also use them in pages to display, but I recommend using a string instead if it doesn't require to do any arithemtics. For instance, let's say the word code is a formatting word for pages to display code in a monospace font. It would look like this :

[ 12 "monospace" font ] "code" style-def

Strings are just string, handy to input content in pages or define a word.

Uses for page formatting

As I said earlier, this language is an all-in-one language for markup, stylesheet and preprocessor. This part will cover the use for markup and page formatting.

All pages are divided in two parts, head and body. The head is not necessary but recommended to provide metadata to the client such as the title or the description of the page. The client must receive a program containing only "style words" and stylesheet. Preprocessing words and unknown words should be ignored at this point.

Here are some words I thought about for now:

bold    italic  underline   title   description     meta
link    code    h1          h2      h3

Uses for stylesheet

The stylesheet will allow us to make our page look more beautiful. The stylesheet system should be able to provide a maximum of cross-compatibility between GUI and TUI clients, providing the same comfort on a GUI as well as on a TUI application because terminals are cool.

As mentionned before, the principal word will be style-def, allowing you to provide a style definition. It will also provides other words as well to tell the client how to format one style word (obviously). Here are a few:

font    decoration  margin  padding

Some words will obviously be ignored by TUI clients but it shouldn't change much from GUI.

Preprocessing

The preprocessing is run server-side which will output static markup and stylesheet. It will allow to run finite loops and if conditions, the preprocessor will be designed to be Turing incomplete for the simple reason that it shouldn't be stuck in an infinite loop.

Nothing is much clear so I can't tell more about it.

Implementation and Standards

The server-side will probably be written in Erlang as this lang has is really good at server-side and because Erlang is a cool programming language.

The client-side is not planned yet.

There will also have some standards regarding the use of the protocol so clients can use it and not be confused about what to implement or not implement. Will also help webmasters to know what to do and know what not to do. Here are a few I thought about.

Conclusion

This is a few ideas I got about a protocol project I thought about because I find the current one kinda complicated when it could be kept simple and lightweight, pages are way too heavy nowadays.

I don't want to replace the current web, just wanna make this project for fun.

If ever I get to implement that, The project will probably be licensed under a FOSS license as well as the standards and pull requests will be open.

Anyways that's all I had to say, see you for another blog!


-- EOF --