Random insight of the night: every couple years, someone stands up and bemoans the fact that programming is still primarily done through the medium of text. And surely with all the power of modern graphical systems there must be a better way. But consider:
* the most powerful tool we have as humans for handling abstract concepts is language
* our brains have several hundred millenia of optimizations for processing language
* we have about 5 millenia of experimenting with ways to represent language outside our heads, using media (paper, parchment, clay, cave walls) that don't prejudice any particular form of representation at least in two dimensions
* the most wildly successful and enduring scheme we have stuck with over all that time is linear strings of symbols. Which is text.
So it is no great surprise that text is well adapted to our latest adventure in encoding and manipulating abstract concepts.
@rafial Both accurate and also misses the fact that Excel is REGULARLY misused for scientific calculations and near-programming level things since its GUI is so intuitive for doing math on things.
Like, GUI programming is HERE, we just don't want to admit it due to how embarrassing it is.
@rafial Now what we need to do is make a cheap, easy to use version of it that is designed for what scientists are using it for it. Column labels, semantic labels, faster calculations, better dealing with mid-sized data (tens of thousands of data point range), etc
@Canageek I'm wondering, given your professional leanings if you can comment on the use of "notebook" style programming systems such as Jupyter and of course Mathematica. Do you have experience with those? And if so how do they address those needs?
If you can't reproduce what was done from what is in the paper, you haven't described what you've done well enough, and redoing it is better then just rerunning code as a bug might have been removed between software versions, you might notice something not seen in the original, etc.
However, the main alternative is to just eschew code entirely. I think this is valid, especially in fields where code is largely irrelevant and you can just provide your data and describe your statistical approach and let the reader deal with it.
@Canageek @rafial You aren't processing those ShelX files on any sort of hardware (or software binaries) that existed in the late 1960's. At best, you're running the original code in an emulation of the original hardware, but you are probably running it on modern software designed to run on modern hardware
Software archeology is inevitable and even desirable
What we want is an open platform maintained by software archeology experts that lets users not sweat the details
However, natural language and scientific techniques naturally change over time too, so it's inevitable that we will have to cope with change.
We already have to do this, it's just our brains do a good job smoothing inconsistencies out.
@urusan @rafial No, they've kept updating the software since then so it can use the same input files and data files. I'm reprocessing the data using the newest version of the software using the same list of reflections that was measured using optical data from wayyyy back.
The code has been through two major rewrites in that time, so I don't know how much of the original Fortran is the same, but it doesn't matter? I'm doing the calculations on the same raw data as was measured in the 60s.
There is rarely a POINT to doing so rather then growing a new crystal but I know someone that has done it (he used Crystals rather then Shelx, but he could do that as the modern input file converter works on old data just fine)
The thing that's hard to keep running decades later is the code, and code is becoming more and more relevant in many areas of science.
Keeping old code alive so it can produce consistent results for future researchers is a specialized job
Ignoring the issue isn't going to stop researchers from using and publishing code, so it's best to have norms
@urusan @Canageek one other thing to keep in mind is that data formats are in some ways only relevant if there is code that consumes it. Even with a standard, at the end of the day a valid PDF document is by de-facto definition, one that can be rendered by extent software. Similar with ShelX scripts. To keep the data alive, one must also keep the code alive.
There are *six* programs I can think of that can process hkl data and model it (shelx, crystals, GSAS-II, Jana, olex2) so it doesn't REALLY matter which you use or if any of them are around in ten years as long as there is *A* program that can do the same type or better modeling (reading the same input file is a really good idea as well as it makes thing easy)
If a solution is physically relevant any program should be able to do the same thing.
I mean, modern versions of Fortran aren't any harder to write them C, which is still one of the most used programming languages in the planet, I don't see why everyone makes fun of it.
@Canageek @rafial @urusan I'm kind of not making fun of Fortran, though the last time I saw any in production it was still F-77, because F-90 changed something they relied on and was too slow; I last worked on some F-77 for the same reason ~30 years ago.
I am indeed making fun of COBOL, but it'll outlive us by thousands of years as well.
Stable languages are good… but also fossilize practices that we've improved on slightly in the many decades since.
> SHELX is developed by George M. Sheldrick since the late 1960s. Important releases are SHELX76 and SHELX97. It is still developed but releases are usually after ten years of testing.This is amazing.
@clacke @mdhughes @urusan @rafial yeah, the big worry is that George Sheldrick is getting very, very old and there are wonders if anyone will take over maintaining and improving the software when he dies. luckily it's largest competitor does have two people working on it the original author and a younger professor so it has a clear succession path.
The semantics of, say, addition isn't going to change. Once you define c = a + b means adding a and b, then assigning the value into c, then you no longer need a reference implementation and you can treat this code like a well defined data format.
Of course, I'm leaving out a lot of detail here, like what do you do on overflow?
@clacke @Canageek @mdhughes @rafial Having a reference implementation just lets you defer to the reference implementation as your spec, and if it's on a well known platform then it can be reasonably emulated on different hardware.
When you think about the reference implementation as a quasi-spec, then it becomes clear that most mainstream languages already have a reference implementation, and thus one of these quasi-specs already.
Just because we can theoretically re-implement Python 2.5.1 as it would run on a 64-bit x86 on your future 128-bit RISC-V processor doesn't mean that you would want to
You just want to see the results, and you don't want them to differ, say because of the 64-bit vs 128-bit difference
A standard platform facilitates this
It's necessary for them to be able to cope, so the end user can ultimately have a smooth experience, and get back to their scientific research.
You don't want the same results, you want the *best* results. If newer versions of the code use 128-bit floating point numbers instead of 64-bit, GREAT. Less rounding errors.
Its like, I can create this model in Shelx or Crystals. They don't implement things EXACTLY the same, but a good, physically relevant model should be able to be created in either. If I try and do the same thing in two sets of (reliable) software and it doesn't work in one, perhaps I'm trying to do something without physical meaning?
Like, it shouldn't matter if i use the exact same Fourier transform or do analysis in R, SAS, or Python. It should give the same results. Stop focusing on code preservation and focus on making analysis platform agnostic.
Also it is going to be *helllll* for someone in 20 years. I know a grad student in physics who has to revisit some code his prof wrote when he was in grad school. On the upside it is apparently well documented. On the downside, the documentation is all in Polish as that is the profs first language and where he went to grad school, whereas the grad student only speaks English.
Now nuclear physics is a bit of an exception, but asdfljk that sounds like hell.
I'd like to add that the norm of a Jupyter notebook additionally promotes the explanation of whatever you are doing in the code.
You're clearly supposed to interleave explanation (okay, now I'm doing this to the data) and code (here is exactly what I did in a format a computer can replicate).
This gives you the best of both worlds.
If all you have is their explanation (or worse, final results), with them having run hidden/unexplained code then it's not as easy to correct them, and you don't know whether their reasoning is incorrect or if it was caused by a software bug without a lot of work
@clacke @Canageek @mdhughes @rafial Another critical factor here is language drift. Even if you ignore hardware and specific software differences, languages change over time. This is even true of natural language.
While I do think the current pace of change is excessively fast, even Fortran and C got new specs every decade or so.
You need to be able to run old language versions on your new hardware, and old languages means old dependencies.
Sheldrick has said the reason his code has been so easy to port to everything is that he only used a minimal subset of Fortran when he wrote it.
I'm interested in how things like Fortran and C and LaTeX have stayed so popular and usable after so long. I wanted to read the Nethack 1.0 guidebook and it came as a .tex file, so I just rand pdflatex on it and boom, usable PDF, something like 30 years after that with no fuss. And yet try opening ANY OTHER file format from the 90s.
If this work wasn't being done by specialists, then these languages would eventually lose their relevance like so many other old languages.
@Canageek @clacke @mdhughes @rafial Of course, the F77/C porting train isn't going to stop anytime soon because porting these languages give a new architecture/system access to basically everything ever written. Thus, there's a strong incentive for the architecture designers and hardware manufacturers to make sure this happens, even if they have to pay for it.
@urusan @clacke @mdhughes @rafial That is fair, I'm from an area of science where you don't go into other people's work like that very often. We are far more likely to remake a compound and do all the measurements over again then we are to try and figure out what someone else did wrong.
If we find a difference between our results and the published ones the older ones probably had an impurity or something and it isn't really worth worrying about. Heck, sometimes you even get COLOUR differences when you make literature compounds, like white crystals vs red crystals.
Plus, C and Fortran used to be high level back in the day. All of these languages are portable across computer architectures
Python has been changing rapidly because it has been transforming into a better form for the long haul, and Julia's changes over the last few releases have been much less disruptive. They'll settle down
@urusan @clacke @mdhughes @rafial I've been tempted to stop teaching myself Python and learn something more stable like Lua instead but everyone else is using python, but it gets more painful to use every year.
I used to just download an exe of Pymol and run an installer and now I need to use some garbage called pip and heaven help you if you use the wrong set of install instructions or run pip instead of pip3 or vis versa.
Then there is the crystallography software that hasn't updated its install instructions since 1999 and you have to manually add a bunch of stuff to the PATH, and manually tell it where your webbrowser, Pov-ray, Infranview and text editor executables are, but I'm confident it will still work next year.
the mastodon instance at cybre.space is retired
see the end-of-life plan for details: https://cybre.space/~chr/cybre-space-eol