Random insight of the night: every couple years, someone stands up and bemoans the fact that programming is still primarily done through the medium of text. And surely with all the power of modern graphical systems there must be a better way. But consider:

* the most powerful tool we have as humans for handling abstract concepts is language
* our brains have several hundred millenia of optimizations for processing language
* we have about 5 millenia of experimenting with ways to represent language outside our heads, using media (paper, parchment, clay, cave walls) that don't prejudice any particular form of representation at least in two dimensions
* the most wildly successful and enduring scheme we have stuck with over all that time is linear strings of symbols. Which is text.

So it is no great surprise that text is well adapted to our latest adventure in encoding and manipulating abstract concepts.

@rafial Both accurate and also misses the fact that Excel is REGULARLY misused for scientific calculations and near-programming level things since its GUI is so intuitive for doing math on things.

Like, GUI programming is HERE, we just don't want to admit it due to how embarrassing it is.

@Canageek very good point. Excel is actually the most widely used programming environment by far.

@rafial Now what we need to do is make a cheap, easy to use version of it that is designed for what scientists are using it for it. Column labels, semantic labels, faster calculations, better dealing with mid-sized data (tens of thousands of data point range), etc

@Canageek I'm wondering, given your professional leanings if you can comment on the use of "notebook" style programming systems such as Jupyter and of course Mathematica. Do you have experience with those? And if so how do they address those needs?

Thanks @urusan, I found the article interesting, and it touched on the issue how to balance the coherence of a centrally designed tool with the need for something open, inspectable, non-gatekept, and universally accessible.

PDF started its life tied to what was once a very expensive, proprietary tool set. The outside implementations that @Canageek refers to were crucial in it becoming a universally accepted format.

I think the core idea of the computational notebook is a strong one. The question for me remains if we can arrive at a point where a notebook created 5, 10, 20 or more years ago can still be read and executed without resorting to software archeology. Even old PDFs sometimes break when viewed through new apps.

@rafial @urusan Aim for longer then that. I can compile TeX documents from the 80s, and I could run ShelX files from the 60s if I wantd to.

@Canageek @rafial You aren't processing those ShelX files on any sort of hardware (or software binaries) that existed in the late 1960's. At best, you're running the original code in an emulation of the original hardware, but you are probably running it on modern software designed to run on modern hardware

Software archeology is inevitable and even desirable

What we want is an open platform maintained by software archeology experts that lets users not sweat the details

@urusan @rafial No, they've kept updating the software since then so it can use the same input files and data files. I'm reprocessing the data using the newest version of the software using the same list of reflections that was measured using optical data from wayyyy back.

The code has been through two major rewrites in that time, so I don't know how much of the original Fortran is the same, but it doesn't matter? I'm doing the calculations on the same raw data as was measured in the 60s.

There is rarely a POINT to doing so rather then growing a new crystal but I know someone that has done it (he used Crystals rather then Shelx, but he could do that as the modern input file converter works on old data just fine)

@Canageek @rafial We're talking about 2 different things here. Of course data from over half a century ago is still useful.

The thing that's hard to keep running decades later is the code, and code is becoming more and more relevant in many areas of science.

Keeping old code alive so it can produce consistent results for future researchers is a specialized job

Ignoring the issue isn't going to stop researchers from using and publishing code, so it's best to have norms

@urusan @Canageek one other thing to keep in mind is that data formats are in some ways only relevant if there is code that consumes it. Even with a standard, at the end of the day a valid PDF document is by de-facto definition, one that can be rendered by extent software. Similar with ShelX scripts. To keep the data alive, one must also keep the code alive.

@rafial @urusan @Canageek And this is why all software should be written in FORTRAN-77 or COBOL.

@mdhughes @Canageek @urusan @rafial Any language that has a reasonably-sized human-readable bootstrap path from bare metal x86, 68000, Z80 or 6502 should be fine.

They don't exist. Yet. Except Forth and PicoLisp.

Also I'd add standard Scheme and standard CL to the list. You can still run R4RS Scheme code from 1991 in Racket and most (all? is there a pure R5RS implementation?) modern Schemes. CL hasn't been updated since 1994.

@clacke @Canageek @mdhughes @rafial Really you just need a well defined language spec (which is easier said than done).

The semantics of, say, addition isn't going to change. Once you define c = a + b means adding a and b, then assigning the value into c, then you no longer need a reference implementation and you can treat this code like a well defined data format.

Of course, I'm leaving out a lot of detail here, like what do you do on overflow?

@clacke @Canageek @mdhughes @rafial Having a reference implementation just lets you defer to the reference implementation as your spec, and if it's on a well known platform then it can be reasonably emulated on different hardware.

When you think about the reference implementation as a quasi-spec, then it becomes clear that most mainstream languages already have a reference implementation, and thus one of these quasi-specs already.

@clacke @Canageek @mdhughes @rafial In either case though, the end user doesn't care about the code archeology aspects of this.

Just because we can theoretically re-implement Python 2.5.1 as it would run on a 64-bit x86 on your future 128-bit RISC-V processor doesn't mean that you would want to

You just want to see the results, and you don't want them to differ, say because of the 64-bit vs 128-bit difference

A standard platform facilitates this

@clacke @Canageek @mdhughes @rafial Language specs and reference implementations make the code archeology work possible for the maintainers of this open platform.

It's necessary for them to be able to cope, so the end user can ultimately have a smooth experience, and get back to their scientific research.

@urusan @clacke @mdhughes @rafial See, this is a lot of focus on getting the exact same results, which for science I think is a mistake.

You don't want the same results, you want the *best* results. If newer versions of the code use 128-bit floating point numbers instead of 64-bit, GREAT. Less rounding errors.

Its like, I can create this model in Shelx or Crystals. They don't implement things EXACTLY the same, but a good, physically relevant model should be able to be created in either. If I try and do the same thing in two sets of (reliable) software and it doesn't work in one, perhaps I'm trying to do something without physical meaning?

Like, it shouldn't matter if i use the exact same Fourier transform or do analysis in R, SAS, or Python. It should give the same results. Stop focusing on code preservation and focus on making analysis platform agnostic.

@Canageek @mdhughes @urusan @rafial You want to first know that you are getting the exact same results in the part of the analysis that is supposed to be deterministic. *Then* you can upgrade things and see differences and identify whether any changes are because you broke something or because the new setup is better.

If the original setup had bugs, you want to know that, and you want to know why, and you won't be able to do that if you can't reproduce the results.

@clacke @Canageek @mdhughes @rafial Yes, this is exactly what I wanted to say.

I'd like to add that the norm of a Jupyter notebook additionally promotes the explanation of whatever you are doing in the code.

You're clearly supposed to interleave explanation (okay, now I'm doing this to the data) and code (here is exactly what I did in a format a computer can replicate).

This gives you the best of both worlds.

@clacke @Canageek @mdhughes @rafial It also helps one spot and correct errors. Maybe they meant to do one thing, but did another thing, and now all their downstream numbers are incorrect.

If all you have is their explanation (or worse, final results), with them having run hidden/unexplained code then it's not as easy to correct them, and you don't know whether their reasoning is incorrect or if it was caused by a software bug without a lot of work

@clacke @Canageek @mdhughes @rafial Another critical factor here is language drift. Even if you ignore hardware and specific software differences, languages change over time. This is even true of natural language.

While I do think the current pace of change is excessively fast, even Fortran and C got new specs every decade or so.

You need to be able to run old language versions on your new hardware, and old languages means old dependencies.

@urusan @clacke @mdhughes @rafial Yeah, but aren't compilers for F77 and ANSI C still being made for everything under the sun?

Sheldrick has said the reason his code has been so easy to port to everything is that he only used a minimal subset of Fortran when he wrote it.

I'm interested in how things like Fortran and C and LaTeX have stayed so popular and usable after so long. I wanted to read the Nethack 1.0 guidebook and it came as a .tex file, so I just rand pdflatex on it and boom, usable PDF, something like 30 years after that with no fuss. And yet try opening ANY OTHER file format from the 90s.

@Canageek @clacke @mdhughes @rafial Yeah, but those compilers don't just magically exist. They're being ported to new architectures and specific systems whenever they become available.

If this work wasn't being done by specialists, then these languages would eventually lose their relevance like so many other old languages.

@Canageek @clacke @mdhughes @rafial Of course, the F77/C porting train isn't going to stop anytime soon because porting these languages give a new architecture/system access to basically everything ever written. Thus, there's a strong incentive for the architecture designers and hardware manufacturers to make sure this happens, even if they have to pay for it.

@urusan @clacke @mdhughes @rafial Exactly. So science can piggy back off of them while waiting for high level work, but no one seems to as demonstrated by how many versions of Python I have installed.

@Canageek @clacke @mdhughes @rafial Jupyter has a solution to this exact problem you are struggling with. A properly set up notebook will handle all this complexity for you.

@Canageek @clacke @mdhughes @rafial C and F77 are great and all, but they are low expressiveness languages. You have to write a lot of code to express a concept.

Python and Julia are far more expressive, and the code will look far more like the underlying mathematical concepts.

That's why most people want to move to these languages even when F77 and C are perfectly serviceable today.

@urusan @clacke @mdhughes @rafial Yeah, there is a reason I'm learning Python a little bit, loading data into it DOES seem somewhat easier, though I *hate* its for some things as it gets REAL hard to read at times. variable.operationi[index] wut

@Canageek @clacke @mdhughes @rafial You might like Julia more than Python, it's way more Fortran-like and drops the impure object orientation that's so popular in mainstream languages these days.

It's also Lisp for people who don't want to deal with Lisp syntax.


@urusan @clacke @mdhughes @rafial How hard is it to move to from a traditional programming background, like, I'm NOT a programmer, but sometimes I want to automate a task or brute force a lot of math (ie I have three values from an elemental analysis and I want to know what range of solvent contamination would fit +-0.4% on each number).

I've done C and C++ but object orientation always hurt my head a bit. I did Fortran90 once in 2005, and started in QBASIC back in high school since it is what my school could afford. So I don't really care about linguistic purity or anything, I just want a fancy calculator

· · Web · 1 · 0 · 1

@Canageek @clacke @mdhughes @rafial Object orientation is a fad IMHO.

I mentioned the impurity aspect because we gain even less than we could from object orientation the way it is usually implemented (as in Python or C++) than if it were properly implemented (as in Smalltalk or Ruby).

So you're right that OO gets in the way of expression, even if it's useful for engineering purposes sometimes (though Julia's multiple dispatch is just better).

@urusan @clacke @mdhughes @rafial Yeah, I work at a much more basic level, like, sometimes not even with functions, just start at the top and go to the bottom style of code, since I just need to do a bunch of math in order.So I don't even know what half of this means, to be honest.

@Canageek @clacke @mdhughes @rafial The main thing you can do with object orientation/single dispatch is to pass around a large, complex object graph that includes your entire set of assumptions.

This is very useful from an engineering perspective, as you can write cleaner, more reusable code that doesn't care about the underlying assumptions.

From a science communication perspective, it's terrible, as all those underlying assumptions are hidden.

Sign in to participate in the conversation

cybrespace: the social hub of the information superhighway jack in to the mastodon fediverse today and surf the dataflow through our cybrepunk, slightly glitchy web portal support us on patreon or liberapay!