Definitions of “interpreted language” and “compiled language” with explanations of why Python and Java are or are not such languages

jbw · September 6, 2024, 4:38pm

i’m not sure whether this belongs in the education category or a more general category.

i’m dissatisfied with the definitions of “interpreted language” and “compiled language” that i have seen.

as an educator, i need to explain these concepts to students.

many people classify the programming languages python and java differently for these two terms. often java is classed as compiled but not interpreted and python as interpreted but not compiled. for me, this seems to require splitting hairs quite finely, as both typical java implementations and the main python implementation: (1) never interpret the source code directly but (2) instead first translate (compile) the source code to machine code for a virtual machine and (3) store files containing this compiled code.

it seems to me that the notable differences between java and cpython are that (a) the virtual machine used changes with each cpython version and (b) cpython handles nearly all details of compilation automatically to the extent that some cpython programmers are completely unaware that cpython is doing any compilation. it is far from clear to me why these differences should be the difference between “interpreted” and “compiled”.

personally, i would class both java and cpython as “compiled” due to the fact that source code is always translated to a virtual machine code. i usually consider both java and cpython to not be “interpreted”, because their interpreters, i.e., their virtual machines, interpret the machine code and not the source code.

i’ve looked at the material on wikipedia related to this and find it quite unsatisfying.

how do you classify these two languages regarding whether they are “interpreted” and/or “compiled”.

can anyone give pointers to clear and logically coherent discussions of this?

thanks for your thoughts!

MegaIng · September 6, 2024, 4:47pm

Compiled vs Interpreted is a pointless distinction most of the time.
This is a property of language implementations, not of the languages themselves (Cython and mypyc is a fully, unambiguously compiled (subset) of python)
This is further muddled by stuff like JIT compilers which are build into the most common Java implementations, but (not yet) in CPython.
What people normally mean when calling java compiled is that it is statically typed in a clearly distinct step before execution. You can also distribute compiled java bytecode and this is a common distrubtion format (.jar files), which isn’t true for python.
Where compiled vs interpreted makes sense is if you are argue in abstract computer sciences and are talking about model of computations and how they relate to the levels above and below them. But here it gets confusing: x86 assembly is compiled, but x86 machine code is interpreted, which really doesn’t match what most people expect these terms to mean.
Same goes for python and java: The user-readable languages are compiled, so that then the resulting bytecode can be interpreted.

(this is a list instead of a coherent text because I can’t be bothered to put in the joining sentences right now)

jbw · September 6, 2024, 5:07pm

i think i mostly agree with @MegaIng, but i am intrigued by two things:

that’s a fascinating definition. can you point anywhere else that defines “compiled” that way?

also an interesting criteria for “compiled”. it would not have occurred to me to take this as part of the definition of “compiled”. [edit: this is primarily due to the fact that cpython’s vm can and generally does change between cpython versions.]

MegaIng · September 6, 2024, 5:10pm

No, this is my interpretation of what people mean when they call some languages compiled and others not if they put java into one category and python into the other. I don’t have a source for that, and it might be wrong. You would have to ask a bunch of people for that, but I would imagine that if you ask them to justify their decision, their reasoning changes. Java feels like a compiled language, but python doesn’t, and I think this is the reason why.

bwoodsend · September 6, 2024, 6:23pm

Could it be as simple as:

python original-source.py

works but:

java OriginalSource.java

doesn’t?

Personally, I’d say that they are strictly speaking both compiled AND interpreted since they both convert source code to something else and neither of those something elses are native machine code so both require an interpreter.

Rosuav · September 6, 2024, 6:44pm

“Compiled” and “Interpreted” are largely meaningless without context. You have to first define the terms, and when you do, they’re still probably not very helpful. For example, let’s define “compiled” as “cannot be directly run without first being put through a compilation process”. This means that Java is compiled, since it has to be first run through javac to produce a .class file, which is then runnable. Great! But what about this:

javac $0 -d /tmp
java /tmp/`basename $0 .java`
rm /tmp/`basename $0 .java`

There. Now Java has just become an interpreted language, since you can directly run the source code without a separate compilation step. The language didn’t change, even the compiler/interpreter is the same, and yet by the (fairly reasonable) definition given, Java just changed from being a compiled language to being an interpreted one.

(I couldn’t figure out a way to write the .class file to stdout and pipe it directly into java but that would be even more elegant.)

At the other extreme, a purely interpreted language would have to somehow be directly executed, instruction by instruction, straight from the file. That could certainly be done with a Turing tarpit, but if we ignore pathological cases, the most likely place you’ll find that sort of behaviour is shell scripts. But, in fact, that isn’t even always the case; if you look at advice surrounding the curl SOME_URL | sudo bash idiom, the most sound advice is “don’t do that, like, EVER”, but the second most sound piece of advice is “wrap the whole file in a block so that it won’t run anything until it has the whole thing”. In other words, even in a shell script, there is value in forcing it to be fully parsed prior to execution.

So what constitutes “compiling”? Does there have to be a saved-to-disk binary? Does the binary have to be, in some way, executable? What if the executable language happens to be JavaScript, as can be done with something like Asm.js?

The reason you’re dissatisfied with them is, almost certainly, that you have never seen good definitions. And that’s because there just aren’t good definitions.

Pigeonholing languages into “interpreted” and “compiled” is just as futile as pigeonholing them into “pass-by-value” vs “pass-by-reference”, or “statically-typed” vs “dynamically-typed”, or “strongly-typed” vs “weakly-typed”, or “programming language” vs “scripting language”, or anything else. At best, once you pin down your definitions, you end up with something that might be of some value, but will be confusing to anyone who has different definitions. At worst, they’re just fuel for interminable debates about “my language is better than your language”, and let’s be honest, we can debate that without any fuel whatsoever…

petersuter · September 6, 2024, 6:55pm

Relatively common terminology is that:

CPython has 1. a source-to-bytecode compiler and 2. a bytecode interpreter.
Java has 1. a source-to-bytecode compiler and 2. a JIT compiler.

You can focus on the first part and consider both to be compiled.
Others may focus more on the second part and consider CPython (bytecode) to be interpreted, but consider Java to be (JIT) compiled.

nedbat · September 6, 2024, 7:19pm

As further demonstration that these distinctions are about implementations, not languages, PyPy is exactly like Java: 1. a source-to-bytecode compiler and 2. a JIT compiler as part of the execution engine.

funkyfuture · September 6, 2024, 10:40pm

this distinction comes from a time when the majority of used languages were actually either the one or the other. and an important distinction that was directly matching the prior one was what kind of file assets were used for execution, source (and increasingly byte) code or a sequence of natively executable CPU instructions (possibly with embeded/linked data and native code for other processors).

i think it’s worthwile to let your students discuss which practical impacts this runtime difference has on aspects of software development, distribution, deployment and usage.

avi.gross · September 7, 2024, 3:01am

I think we are barking up the wrong tree in some ways.

A real question is related to education and how suitable some languages are versus others. The compile/interpret differentiation is not what I see as even slightly important.

Let me explain. Back when I was in high school, I used to sneak into the college computer lab and play around and largely self-educate on languages like BASIC or FORTRAN and others. For BASIC and some others, I used a teletype that printed on rolled paper and could save or load from paper tape. FORTRAN not so much.

There was then a significant difference in interactivity. One had to be typed and put in order and so on and when I got inevitable errors, it could take days to make replacement cards and wait for the result. Nothing ran at all until there were no errors and even then, logic errors took even more time once it compiled. BASIC let me edit as I went and insert new lines of code and it would run until it hit an error. It had the capacity to be interactive and stop and ask for input and so on. The FORTRAN I used required info to be put into a data section after the main program to provide the input statically.

Years later I was teaching computer languages like Fortran which required submitting decks of punch cards and getting back results in hours or even days as reams of printouts. At the same time, I was able to do my own work on a minicomputer using languages like PASCAL where the edit/compile/edit/compile/edit/run went a bit faster and I had other better facilities like editors. Sort of half interactive.

I then switched to C and later C++ and even some S along with the many other mini-languages UNIX came with including sh/csh/ksh while at Bell Labs but noted that there were some sort of more interpreted languages I used such as AWK or PERL which at the time really were interpreted live.

What I think counts is the ability to get partial feedback even before sending in a perfect program. Perhaps not universally, I associate compilers with being rather unforgiving and interpreters working with you to some point before they object. In particular, languages like Python and R let you pause in mid program and ask for the values of variables or evaluate some statement so maybe you can see if things are as expected or have diverged and need debugging. Environments like RSTUDIO actually will have a window that can show the values of all variables at a glance.

But many compiled programs can be run in a debugger that allows you to similarly pause and examine and even make changes on the fly. Someone with enough training, can actually do quite a bit and realistically every compiled language is simply interpreted once while many interpreted languages actually half-compile when they can including some able to compile small segments on demand and just-in-time.

Many interpreted language programs are run without human connection once they are ready and debugged. Whether they evaluate the original code or some kind of byte code or have been converted to machine language may only matter in some edge cases, such as where the code creates some new code dynamically and evaluates it.

What is important depends on the education being imparted. Some courses teach ideas and concepts within computer science and sometimes something like Turtle Graphics meets some such needs well enough. Others want to teach you how to actually get things done as an individual and perhaps in a narrow range of applications and some languages may be fine for that.

Yet others want you to learn to work with groups and cooperate and maybe even use other forms of parallelism and very different languages may meet those needs.

If the goal is education or the goal is getting jobs and so on, some may be a better fit. Today, there are many jobs wanting a Python programmer but others value something like RUST.

Teaching some people many things can expand their horizons. Other people seem to want to learn just one thing and will mostly be confused that different languages have so many different variations.

But one reality is that so many of these other considerations do not give as much weight about caring if a language is interpreted or half-interpreted and care more about how well they suit needs.

Python once suited some needs well-enough that it became a teaching language in many eyes. But, I suggest an opinion that it has been changed so much that perhaps it no longer is ideal for simple introductory classes. There are too many ways to do anything and when students ask how to do things, I regularly see advanced answers offered at a wrong level for them to meet classroom expectations.

If an assignment is teaching you simple ways to use lists, it may be too early and confusing to suggest a one-liner using list comprehensions nested or suggest they use numpy. Those are nice things for after they have learned some basics.

So, what do we mean by interpretation versus compilation. I program in oodles of languages and notice some differences that can make a language easier or harder to work on linearly or other ways and that may almost dictate whether it should be seen as compiled or interpreted. I do note that despite what some say, Python may increasingly be mostly compiled as ever more functionality is rewritten to call functions in libraries created using languages like C/C++ and others. It is mainly the uppermost levels that can be seen as interactive or …

Consider languages that require a variable to be declared, including the exact type, before it can be used. Functions that call other functions must be shown in a way that allows the inner call to be declared or even defined earlier in the code. Other languages scan down within a file and automatically note all function names declared and go back and work on interpreting or compiling. It is hard to look ahead while I am still typing!

Yet others just create a variable when it is used. Moments later it can be given some other type to hold and it just works. Some play all kinds of different games on when variables are in scope or whether you can have many with the same name but different signatures and many more such concepts.

Some concepts make it fairly hard to be interactive as when I am typing in line by line and forgot to declare a variable earlier or realize my function calls another not yet defined. This means more planning in advance, or saving code in an editor, modifying, and resubmitting.

There is no right or wrong here, merely choices and some are great for a production where errors must be avoided, but at the same time, are not a great match for education.

So, no matter how you choose to educate students, they need to know a bit about the rest of the world and not expect much. Another thread here has been discussing whether Python could benefit from some form of delayed evaluation. Languages like R have always had that and may want some form of forced immediacy. In my experience, people who learn both can experience quite a bit of cognitive dissonance. An ideal education tool may still need to be coupled with some education about other possible ways, if only as a warning. Your first language may leave a serious mental imprint.

Luckily for me, I never seem to have had a first language in anything as I typically encounter many at the same time, LOL!

tunedal · September 7, 2024, 11:56am

This used to be the case, but in Java 11 and later it does in fact work. So by this definition there is no longer any difference between Python and Java in this regard.

JEP 330, implemented in Java 11, specifies the feature for running single-file programs directly from the source files and JEP 458, implemented in Java 22, extends it to multi-file programs.

Rosuav · September 7, 2024, 12:36pm

Well, well, well. And I thought it required a three-line shell script to achieve that.

voidspace · September 12, 2024, 9:20am

This is how I explain it:

Both Java and C# are typically called compiled languages but are in fact bytecode compiled just as Python is, but Python is typically called an interpreted language.

Python has an interpreter, which at its heart is a big loop (“the interpreter loop”) with a switch statement over the bytecode (although that architecture is gradually becoming more complex its still basically true). This interpreter loop interprets the bytecode by executing them individually and sequentially.

Both Java and C# have an additional “compile phase”, with Java this is the world class Just In Time (JIT) compiler Hotspot which emits machine code on the fly at runtime. It is this machine code that is executed.

.NET (C#) has an Ahead of Time (AOT) compiler which generates machine code which is compiled in to .NET assemblies (.dlls). So again it is machine code that is being executed.

So C# and Java have in common with languages like C and C++ that the source code is compiled to machine code and it is the machine code which is executed.

pypy has a JIT (and other JITs for Python exist - like Numba - or are being developed). The pypy jit only compiles hot spot loops, so it is still not always machine code being executed but the distinction between compiled and interpreted continues to blur.

Python has an interactive interpreter, allowing runtime (and interactive) execution of code. Another feature typically found in interpreted languages and not compiled ones.

avi.gross · September 12, 2024, 5:19pm

Welcome, Michael.

In brief, it sounds like when people talk about interpreted, it might be more about the human experience than the technicalities most people do not see or care about.

Computer programs have gotten faster so rapid feedback is possible whether interpreted or compiled in small batches into pseudocode or assembler or machine language. What matters is being able to try something out without having all of it done and without errors.

Other important factors for using a language like Python for education, of course, is the availability of smart editors and other tools.