I was recently a panelist at the 31st Protein Structure Determination in Industry Meeting (PSDI 2023), where we touched on the role of cloud computing in cryo-EM processing. Our discussion highlighted several misunderstandings and debated points about using cloud technology in this field. Reflecting on the discussions, I feel it's time for a dedicated blog post so that we explore these misconceptions, contentious issues, and the unexplored possibilities of cryo-EM in the cloud together.
Cryo-EM has long grappled with the misconception that its large data volumes are too unwieldy for effective transfer cloud processing. This view is directly contradicted by the achievements in genomics, particularly demonstrated by the capabilities of Next-Generation Sequencing (NGS) instruments like Illumina's NovaSeq X Plus. In its largest setup, this sequencer efficiently handles the transfer of 16 terabytes (TB) of data to the cloud in just about 48 hours. This example illustrates that the challenge of transferring large datasets is not only manageable but has become routine in fields like genomics. Moreover, pharmaceutical companies, leveraging NGS, have accrued extensive experience in big data transfer, further reinforcing the practicality of managing large-scale data in cloud environments.
A prevailing assumption within the cryo-EM community is that migrating to cloud computing is prohibitively expensive, potentially draining research budgets. This view is often based on a direct comparison with traditional on-premise solutions. However, such comparisons fail to account for the different financial models and resource utilization strategies inherent to each approach. On-premise infrastructure requires substantial upfront investment and a commitment to long-term maintenance, which can be inefficient given the variable nature of cryo-EM projects. This often leads to a mismatch between resource availability and actual usage, resulting in inefficiently used infrastructure and tied-up capital. In contrast, cloud computing, with its adaptable and demand-responsive model, offers a more cost-effective solution for the fluctuating demands of cryo-EM research. The cloud's pay-as-you-go model is inherently aligned with the ebb and flow of cryo-EM workloads. This model eliminates the need for large upfront costs, allowing researchers to start with minimal investment and scale resources according to their project's demands. The flexibility to adjust resource usage based on real-time needs and the absence of initial capital expenditure make cloud platforms highly cost-effective for cryo-EM. Furthermore, the option to experiment with usage patterns before committing to long-term contracts or volume discounts enables a more strategic approach to cost management. By understanding these nuances, the cryo-EM community can recognize the cloud as a financially viable and strategically advantageous solution, offering both scalability and cost-efficiency in line with the unique demands of their research.
Managing the vast and intricate datasets poses one of the most significant challenges in cryo-EM. The cloud, with its versatile and scalable computing power, emerges as a shining solution to this complex issue. It offers high-performance storage options that are essential for GPU-intensive tasks in cryo-EM, facilitating rapid data processing. This scalability is not just about handling large volumes of data but also adapting to fluctuating computational demands seamlessly.
The automated transfer feature of cloud services is a game-changer, efficiently moving data between high-speed and ultra-cold storage as needed. This functionality greatly simplifies the task of managing data, from active analysis to long-term archiving. Additionally, the economic aspect of cloud computing, particularly for storing extensive data sets, is striking. For example, the cost of storing 1 petabyte (PB) of data in ultra-cold storage for a year can be as low as $12,000, making it a highly cost-effective option compared to traditional storage solutions.
Another common misconception is that cloud platforms may not be secure enough for sensitive data. In contrast, cloud services are often equipped with military-grade security features, providing robust security measures that exceed those of traditional IT infrastructures. A central component of this security is the use of Virtual Private Clouds (VPCs). VPCs ensure that customer data remains within a secure, isolated environment on the cloud provider's network, significantly mitigating the risk of external breaches. This high level of security is trusted by entities managing highly sensitive data, including NGS labs, major financial institutions, and three letter agencies such as the NSA or CIA. The confidence these organizations place in cloud security underlines its effectiveness and appropriateness for cryo-EM data management. This widespread trust demonstrates that cloud platforms are more than capable of addressing the security needs of cryo-EM, providing assurance that security concerns are thoroughly addressed in these advanced computing environments.
In cryo-EM, performance is paramount in achieving better structural insights quickly. On-premises hardware procurement can be costly, and by the time it's implemented, it may not represent the latest generation. In contrast, cloud computing, with its infrastructure-as-code approach, ensures access to cutting-edge hardware continuously. This competitive advantage enables researchers to expedite critical structure determination, staying ahead in the race for scientific discovery. Additionally, cloud flexibility allows users to choose the right "vehicle" for the task, whether a high-performance "Ferrari" or a cost-efficient option, optimizing both speed and efficiency.
Cryo-EM industry professionals often find themselves longing for the more freewheeling IT practices prevalent in academia, a landscape reminiscent of the 'Wild West.' Traditionally, academic cryo-EM labs have enjoyed a level of autonomy in their IT practices, often characterized by "Shadow IT" — the use of unsanctioned hardware and software to advance research. This approach, fueled by frustrations with the limitations of centrally provided IT resources, led many PIs to independently purchase high-performance gaming PCs with advanced GPUs for their labs. These systems, though beneficial for research, operated outside institutional oversight and are increasingly raising concerns over security and resource management.
This shift in perspective stems from the realization that even unintentional misuse or vulnerability in such unmonitored computing setups can have severe consequences. While most scientists are ethical, the potential for misuse is real. First of all, it only takes one bad apple. More importantly, in an era that is ripe with cybercrime and where AI has made cyberattacks more accessible, institutions are becoming increasingly risk-averse. One can therefore expect that universities, starting with those with substantial endowments, are inclined to discontinue the tolerance of these unmonitored and high-risk computing configurations in laboratories. The move towards robust oversight is on the horizon, making it crucial for cryo-EM researchers to prepare for a future where the days of "Shadow IT" are numbered. Therefore, rather than the corporate world adapting to the more relaxed IT norms of academia, the current trend indicates a reversal. Academic institutions are increasingly aligning with industry standards in IT management, recognizing the need for stringent security and oversight.
The trend towards cloud computing is underscored by a growing commitment to environmental sustainability. As global awareness and regulatory pressures regarding carbon emissions increase, the need for greener, more energy-efficient computing solutions becomes paramount. Cloud computing emerges as a vital solution in this regard.
Data centers that power cloud services are engineered with environmental sustainability as a priority. They typically employ advanced cooling technologies, optimized server utilization, and increasingly rely on renewable energy sources, such as solar and wind power. This optimization translates to a significant reduction in energy consumption per unit of data processed, compared to traditional data centers. The centralized nature of cloud services also means that resources are used more efficiently, reducing the overall environmental footprint of computational activities. As a result, cloud computing is not just a technological and cost-saving choice but a strategic one, aligning cryo-EM research with the broader, ever-important goal of environmental responsibility.
Most cryo-EM research entities share similar needs in terms of data processing, analysis, and storage workflows. The traditional approach of each entity setting up and maintaining its unique IT infrastructure not only leads to a duplication of efforts but also hinders interoperability and resource optimization. By transitioning to cloud-based solutions, these entities can leverage a shared, standardized environment that streamlines operations, reducing the time and resources spent on individualized IT setup and maintenance.
Furthermore, the adoption of cloud infrastructure in cryo-EM opens the door for pre-competitive collaboration among companies. Recognizing the shared benefits, companies can collectively invest in developing robust cloud backends that enhance the efficiency of cryo-EM applications in drug discovery. Such collaborative efforts can lead to the creation of powerful, shared resources that include advanced computational tools, standardized data repositories, and collaborative platforms, thereby accelerating the pace of research and discovery.
Cloud computing's transformative impact in cryo-EM is most evident in its ability to foster collaboration beyond traditional institutional boundaries. In typical academic settings, competition for limited computational resources like clusters can occur within an institution. Cloud computing, however, offers a dynamic solution by allowing for the creation of dedicated clusters for specific projects. This capability enables researchers to collaborate effectively with peers, both within and outside their organizations, forming groups based on actual research needs rather than institutional constraints. This shift not only leads to more efficient use of computational power but also enhances the focus and productivity of research collaborations.
Cloud computing's scalability and flexibility open up entirely new possibilities for research such as the integration of cryo-EM with other advanced technologies. For instance, this is seen in the integration of cryo-EM with NGS for antibody discovery by the revolutionary cryo-EMPEM approach, a topic explored in a previous blog post. Additionally, cloud computing makes the concept of connecting cryo-EM facilities in an IoT-like network more achievable. This connectivity facilitates collaborative platforms for shared data sets and resources, as discussed in another past post, enabling researchers to work together seamlessly, irrespective of their location.
This approach shifts the question from "What can I achieve with my current resources?" to "What are my scientific objectives, and how can I realize them using available technology?" Cloud computing's adaptability to a project's specific requirements – be it processing large datasets or collaborating remotely – signifies a major advancement in cryo-EM research, enabling more innovative and expansive scientific exploration.
The shift towards cloud presents a significant opportunity for facility managers in academic settings, particularly in terms of enhancing collaboration with industrial users from pharmaceutical, biotech, and contract research organizations (CROs). Engaging with these industry partners often proves to be more rewarding for facility managers. Industrial samples tend to be of higher quality, the approach to research is more pragmatic, and the projects are typically more applied. Moreover, these interactions with the industry can lead to fruitful career opportunities.
One of the key aspects of working with industrial partners is the involvement in data processing, which is crucial for facility managers who aspire to be more than just operators. Traditional collaborations often run into challenges like data security concerns or congestion on the university's cluster when processing industrial projects. Cloud computing offers a solution by allowing facility managers to set up dedicated clusters for each industrial project, providing secure and exclusive space for customer data. This approach not only streamlines the workflow but also ensures compliance with data privacy and security standards, which are paramount in industrial collaborations.
Furthermore, gaining expertise in cloud technology can open additional career pathways for facility managers in the rapidly evolving world of technology. The tech industry, known for its lucrative job opportunities and massive financial turnovers, contrasts with the more modest financial landscape of the life sciences sector. By becoming proficient in cloud-based technologies and methodologies, cryo-EM facility managers not only enhance their current roles but also position themselves for potentially high-paying career opportunities in the tech sector. This shift could be a strategic move for those looking to transition into an industry where the financial rewards and scope for innovation are significantly higher. Therefore, embracing cloud computing in cryo-EM facilities is not just a technological upgrade; it's a strategic career move for facility managers seeking broader opportunities and professional growth.
Adopting cloud is crucial for both advancing research and attracting young, talented researchers. This new generation, adept in cloud technologies, views traditional, on-premises IT infrastructures as outdated and limiting. For these young talents, being confined to outdated, congested on-premises IT infrastructure can be a major deterrent. It signals a lack of progressiveness and can lead to disillusionment, especially when they are aware of the advanced capabilities and opportunities afforded by cloud-based systems. They rather aspire to work with organizations that embrace cloud technologies, seeing this as a hallmark of modern, forward-thinking research environments. Institutions that adopt cloud computing demonstrate their commitment to staying at the forefront of scientific innovation and align with the culture and expectations of these 'cloud natives.'
Given the many advantages, why has cloud computing not fully transformed cryo-EM yet? There are two main reasons for this: Firstly, there's a general tendency within the field to prioritize the purchase of microscopes while underinvesting in IT infrastructure. This approach is somewhat perplexing considering that cryo-EM is inherently a digital workflow. From protein purification to solving structures, the majority of the process is conducted in silico. Despite this, IT often remains an afterthought in budgeting and planning. This underinvestment in the IT aspect of cryo-EM is a contributing factor to why many facilities report a meager output of structures, a common grievance among facility managers.
The second reason stems from a misunderstanding of how to harness the cloud's potential. While the benefits of cloud computing for cryo-EM are significant, they are not automatically realized. The cloud should be seen more as a supply store rather than a ready-made solution. It requires the right partners with expertise in both scientific computing and the specific domain of cryo-EM to architect an effective IT platform. In this regard, companies like Clovertex, who are cloud supercomputing experts for pharma and possess domain expertise for cryo-EM IT, are crucial players in bringing the benefits of cloud to fruition. Establishing a robust and efficient cryo-EM IT platform requires investment in skilled partners who can tailor cloud solutions to the unique needs of the field.
In conclusion, to truly leverage the advantages of cloud computing in cryo-EM, a shift in mindset is needed. Institutions and research facilities must recognize the importance of IT infrastructure as a foundational element of cryo-EM workflows. Furthermore, forging strong partnerships with knowledgeable cloud experts who understand both the technical and scientific aspects of cryo-EM is crucial. Investing in these areas should be the first step for anyone looking to develop a successful and productive cryo-EM platform.
1. How Clovertex blends bespoke and off-the-rack cloud computing in Pharma
2. Big in Japan: IoT cryo-EM as a leap towards ultimate automation
3. À la carte monoclonals using cryoEMPEM for antibody discovery