Why should you care about HPC?#

Over the last decade, research universities have committed significant financial, infrastructural, and staff resources to developing high-performance computing (HPC) programs, usually with the entire campus research community as a target audience. Some fields, like astrophysics and nuclear engineering, have a long track record of training their students how to use HPC, viewing it as an indispensable part of their research toolkit. For scholars working with datasets that are vastly larger than an individual modern laptop, or running computationally intensive simulations at scale, the value of HPC is self-evident. This is not the case for most humanities scholars. Even humanities scholars doing computationally-intensive work (e.g. syntactic parsing using natural language processing) at relatively large scale (tens of thousands of texts) can usually accomplish the task within days, at most, on their own laptop.

If the scale of most computing in the humanities does not require HPC, why invest the time in learning its unfamiliar jargon and command-line syntax? For humanists, the biggest motivation for using HPC isn’t technical, it’s legal. There is a vast trove of in-copyright literature, documents, and other materials of interest to humanists that has been digitized (e.g. through university partnerships with Google for the Google Books project) or could be made digitally accessible (e.g. through circumventing technical protection measures, like cracking the encryption on ebooks), but policy-makers at multiple levels have put requirements in place to protect these materials from being improperly accessed or released to the public. In short: there’s no technical reason why you can’t analyze in-copyright texts on your own laptop, but there may be policies that require these texts to be stored only on secure university servers, and require you to use university compute infrastructure.

Your standard HPC system is not the infrastructure most humanists would create for their own use, if given the opportunity to develop something from scratch. As part of their research computing programs, some institutions may offer easier-to-use, browser-accessible interfaces to the computing cluster that lower the barrier to entry, but still meet security requirements. The fact remains, though, that research computing support is expensive to develop and operate, and humanists are a very small fraction of the user base. While there is still space for humanists to advocate for interfaces that are more accessible to less-technical scholars (which includes many colleagues in the social sciences and even biological sciences), improvements will come slowly, at best. If you want to work with in-copyright materials in a policy-approved way, it’s worth learning to work with HPC as it currently exists.