The Long-Term Problem With Dynamically Typed Languages
This may be the only time I weigh in on the static vs. dynamic typing discussion. Each side has its extreme proponents, and people differ in their ability and desire to work in systems with implicit invariants.
Many years ago, back when Java and C++ were the Mainstream Languages and Python was the shiny new up-and-comer, I read Bruce Eckel's arguments in support of dynamically typed languages, and some of the nonobvious (at the time) ways you can get more done at higher quality in a more flexible language.
If I recall correctly, the arguments were that type systems can be approximated with unit tests (neither subsumes the other), and the ease of getting code up and running in a dynamically-typed language outweighs the benefits static types provide, especially when (explicit) static types require more thought and commitment before the problem space has been developed and understood. That is, dynamic languages are more fluid, and you can test bits of the program even before they're made to fit with the rest of the code. In statically typed languages, on the other hand, the code must compile before it can be run at all.
Note: I feel a temptation to qualify every statement I'm making, because the programming language space has since been explored in more depth, and significantly more nuance is understood now than was in the late 90s and early 2000s. We are in the midst of a programming language renaissance.
I have some empathy for Bruce Eckel's argument. Early on in a system's life it is important to have some flexibility and "play". However, as the correct shape or factoring of the software is discovered, it becomes useful to have the computer enforce more and more invariants.
Note that some people treat static typing vs. dynamic typing as a binary switch. In reality it's a bit more of a continuum. Python will tend to throw exceptions rather than implicitly convert values. JavaScript will implicitly convert values sometimes, but not always. PHP tries to make whatever code you wrote run, even if it's nonsensical. Thus, among dynamically-typed languages, some languages tend to catch type errors sooner, simply because they're less forgiving at runtime.
IMVU's technology stack is weighted heavily towards dynamic languages. This is largely a product of the year of its birth. Everyone was writing PHP and Python in the mid 2000s. The website backend consists of a prodigious amount of PHP. The website frontend is JavaScript, because that's really the only option. The client is built out of C++ (for the engine), Python (for all the business logic), and JavaScript (for the UI).
After ten years, however, I can look back and speak with confidence about the long-term effects of building a business on top of dynamic languages.
Most of the core PHP APIs from IMVU's early years are still around. And they're terrible. They're terrible but they're "impossible" to change, because the iteration time on running all the tests, verifying in production, and so on is never worth the smallish, indirect benefits of improving the APIs. Obviously, nothing is impossible, but relying on a giant test suite and test infrastructure to prove the correctness of renaming a function or adding a parameter, in practice, is a significant coefficient of friction on the software's ability to evolve over time.
Not improving core APIs results in a kind of broken windows effect. When APIs are confusing or vague, people tend not to notice other confusing or vague APIs, and it slows everyone down in the long term.
Thus, instead of easily refactoring the legacy APIs, people think "I'll just make a new one and migrate the code over!" And now you have two hard-to-change APIs. And then three. And the cycle continues. Additionally, this cycle is fed by architect types who know or think they know a better way to do things, but can't be bothered to update the old systems.
With a type system (such as in C++, Go, Haskell, or Java), the iteration time on renaming a function or changing its signature is a single build. In Haskell, it's even better: simply type ":r" into ghci and see all the places where the code needs to be updated. That is, a type system flattens the cost of change curve. Small API or performance improvements that otherwise wouldn't be worth it suddenly are, because the compiler can quickly tell you all the places that need to be updated. You don't need to run all the tests across the history of the company in order to see whether you've missed a spot.
There's another huge cost of dynamic languages. It has to do with engineers building an understanding of existing systems. For all the mental energy spent on whitespace and curly braces, the most important component of understanding software is grasping data flow. Programs exist to transform data, and understanding how that's done is paramount. Types accelerate the process of building a mental understanding of the program, especially when lightweight types such as CustomerId (vs. Int) are used.
I can't say I regret IMVU building its technology on top of PHP or Python or JavaScript but I now believe that the long-term reduction in agility is not worth the short-term benefits, especially given the plethora of other options: Java, Go, Haskell, perhaps F# or some other ML, and so on.
I wonder how Facebook feels on this topic. :)
See discussion on Hacker News (2015) and /r/programming.
"Most of the core PHP APIs from IMVU's early years are still around. And they're terrible. "
Reads like a PHP rant, (which is laudable), but it took you a decade to find out?!?
And then you want to kick Python out with the bath water? Calm down lad before you progress to self-harm.
~~the iteration time on renaming a function or changing its signature is a single build.~~
Doesn't tests help here?
PHP is getting better. There are type hints for object arguments. Type hints for scalars and return values are coming in 7.0. It's far from perfect but introduces partial type-safety. Value objects also help. Facebook has hack/hhvm - I don't think they use it for most of their infrasture yet.
The argument that refactoring is not worth it because tests take too long to run does not make any sense to me, can you elaborate?
@kp: Tests absolutely help. In fact, they're absolutely critical in a large Python or JavaScript or PHP codebase. However, when all you want to do is, say, remove a parameter from a function, a statically-typed language compiler would tell you all the places you need to change in just a couple minutes. But, in a dynamic language, you have to run the tests multiple times to make sure you catch everything, and even still you're relying on perfect test coverage.
@Filip: PHP is absolutely improving. We love type hints. The next step is a compiler that runs ahead of time and tells you everything that's wrong! We're not far from a world where PHP is a very interesting "gradually-typed" language.
OK, that's understood that you would need full coverage for your dynamic language code. That would probably mean, all of the conditionals. And it feels like you're saying it's too much. OK then, what is your target coverage for a statically typed language? Which part of the code you will not test then? Seems to me that you still want to test all the code.
isn't this a solved problem? by statically typed languages that do most of the dirty work in creating type labels for you? e.g. scala? or ones based on hindley-milner?
I rather work on the challenge of creating more perfect test coverage then having to deal with all the extra boilerplate that comes with rigid typing rules.
Certainly this lack of ability to refactor is not an inevitable situation with dynamically typed language code bases, just something that is possible to happen if the investment in testing isn't made.
On the dynamic side I work mostly in Python. It has not been my experience that types in my C++ make understanding the dataflow easier than not having them in Python. I agree that in static languages I prefer thin types like CustomerID instead of Int. But then again that information is usually conveyed in well named variables and parameters.
Most of these problems seem to be more related to poor namespacing and modularity support in these untyped languages, than the fact that they're untyped per se. Although, yes, the amount of static analysis that you can do in a language with a richer type system is much greater, none the less...
In a language with good namespacing support and lexical scoping, regardless of its types, you can always tell people all the places they need to change when you rename a particular variable, because all names in a lexically scoped language are statically known (although this isn't true of languages that have first-class macros, like Kernel, or languages where you can mutate the namespace dynamically, like Lua or javaScript + with). In fact, Clojure is one untyped language where you get name changes pointed out for you by the compiler. Unfortunately, Clojure will only point them one at a time -- a tool could easily refactor the whole source for you, however.
Good support for modularity (and controlling side-effects) will constrain changes, such that you can be guaranteed that as long as you don't change the signatures of any function, you'll only need local changes. They also guarantee that signature changes only affect clients of that module. With first-class modules, again, things are trickier because modules can be consumed and created dynamically. Type inference might help, depending on language's semantics (Facebook's Flow type system is a good example of this).
In fact, controlling side-effects seems to be the major source of "maintainability gains" in a programming language, because it makes all changes as local as possible. Even just providing immutable data structures by default you get a lot of this locality, and I think Clojure is a good example of that today.
Well, other than that, there's research going on in type inference, soft typing, optional type systems, gradual type systems, and static contract verification (which is what I want for my own untyped language), which can help with these, although mixing these different solutions in a single language is a tricky thing :(, so it's much easier to start with a rich type system (like Haskell's or Scala's) and add some dynamism to it. And might also be the reason more interesting static analysis tools are written targeting those systems.
What we did at last.fm, which I understood was very similar to the facebook stack, was to start with everything in PHP during the "exploratory" phase. But as a problem became well-understood and we knew what an API needed to be, we'd move it to being a backend service in Java, which the frontend called via Thrift. That way we could play to the strengths of both languages - use something very dynamic for the early phases, and something clear and maintainable for the long term.
(That said, nowadays I advocate Scala for everything - I think it's flexible enough that you can do exploration and quick/dirty scripts in it even though it has a static type system)
Chad, What are your thoughts on gradual typing, hybrid typing, optional typing, etc.?
I usually find myself agreeing with every article about dynamic/static typing (regardless of the side being argued) right up until it becomes a mutually exclusive decision between dynamic tying and static typing.
Seems like a gradual typing system would give you the agility where you need it, and the ability to ensure correctness where you want it.
I definitely recommend tests in statically-typed languages too! However, you don't always need perfect coverage just to verify that functions have the correct names and such. You can rely on the compiler for that.
IMVU has a huge amount of test coverage. Lack of test coverage isn't the problem. The problem is that the tests take 30 minutes to run (on a very wide cluster).
Also, statically-typed languages don't need to have extra boilerplate - consider Haskell or ML.
Controlling for effects is a huge win! I love that part of Haskell.
Thanks for the insight!
I love the idea of optional typing! PHP/Hack might be the first one to show us how it feels.
I enjoyed your article!
I'd be interested to know what your thoughts are about Julia, which is dynamically typed but provides the option of restricting method selection based by specifying the types of a function's parameters.
> They're terrible but they're "impossible" to change, because the iteration time on running all the tests, verifying in production,
You mean they have highly coupled dependencies on other PHP code. Which implies bad design, but again...no qualification so who knows what this is supposed to mean.
> With a type system (such as in C++, Go, Haskell, or Java), the iteration time on renaming a function or changing its signature is a single build. In Haskell, it's even better: simply type `:r` into ghci and see all the places where the code needs to be updated
You then omit that you have to run your tests again. Wouldn't this be equally "impossible" to change, once you have a typed system with tightly coupled dependencies on similar code? I mean the tests are prohibitive midway in your rant, but they aren't the problem once you shift to a paradigm you are trying to make an argument for. This is intellectually dishonest.
In general, there are fewer cases for statically typed languages because programmers don't try to enforce emit types in dynamic languages in most teams. This is another bad design smell. Tests are for business logic/program flow, not to guard against your niche PHP typecasting fears. Stop allowing functions to emit bad values and you can decouple them. Bad tests for bad services is not uncommon, but it's not that hard to deduce.
Please take a moment to understand how your environment and bad practice has basically poisoned you against the multitude of tradeoffs. Being able to coerce types is GREAT for optimizing performance and repurposing algorithms at will. Lower cost to change is a business concern. Lower cost to hire is another. The gap (availability and cost) between a programmer who can intuitively grasp abstract types in Haskell vs dynamic types in a scripting language is a practical concern.
Well said. I think the reason dynamic types languages are in vogue is that it fits in the "twitter world"/// speak before you think.
Only with experience does someone realize the folly of their ways.
Static types not only require some thought up front, but they also allow for the fact that even the best design may not be the best in 2 year, or 6 months, etc, and the requirements may have changed. With a statically typed language you can methodically "redesign" as needed. Without types, good luck. Can I hear "rewrite" ? I think I do...
[…] talked about this in The Long-Term Problem With Dynamically Typed Languages. As codebases get huge, you really want to be able to lean on the computer in order to safely make […]
I don't understand. If it's just function signature change, you can run a global grep / find of all your files that use this function, right? Running tests is slow. But will grep also be too slow for your code base?
[…] with my bias, I’ve never worked on a codebase that’s lived for 10 years, and I found The Long-Term Problem With Dynamically Typed Languages to be an interesting […]
"I wonder how Facebook feels on this topic."
I think they agreed with your closing thought that long-term maintainability is more important. Case in point, they drastically remodeled PHP to make it usable, and that included adding a static typing system that works very well: http://hacklang.org/ Now they use Hack almost exclusively as opposed to vanilla PHP.