Difference between revisions of "LII:Organizational Memory and Laboratory Knowledge Management: Its Impact on Laboratory Information Flow and Electronic Notebooks"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 1: Line 1:
'''Title''': ''Organizational Memory and Laboratory Knowledge Management: Its Impact on Laboratory Information Flow and Electronic Notebooks''
'''Title''': ''Organizational Memory and Laboratory Knowledge Management: Its Impact on Laboratory Information Flow and Electronic Notebooks''


'''Author for citation''': Joe Liscouski, with minor editorial modifications by Shawn Douglas
'''Author for citation''': Joe Liscouski, with editorial modifications by Shawn Douglas


'''License for content''': [https://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International]
'''License for content''': [https://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International]
Line 33: Line 33:
A researcher came into our analytical lab and asked about some results reported a few years earlier. One chemist recalled the project as well as the person in charge of that work, who had since left the company. The researcher thought he had a better approach to the problem being studied in the original work and was asked to investigate it. The bad news is that all the work, both analytical and previous research notes, was written into paper [[laboratory notebook]]s (1960s). Because of their age, they had left the library and were stored in banker’s boxes in a trailer in the parking lot. There, they were subject to water damage and rodents. Most of that material was unusable, and the investigation was dropped.  
A researcher came into our analytical lab and asked about some results reported a few years earlier. One chemist recalled the project as well as the person in charge of that work, who had since left the company. The researcher thought he had a better approach to the problem being studied in the original work and was asked to investigate it. The bad news is that all the work, both analytical and previous research notes, was written into paper [[laboratory notebook]]s (1960s). Because of their age, they had left the library and were stored in banker’s boxes in a trailer in the parking lot. There, they were subject to water damage and rodents. Most of that material was unusable, and the investigation was dropped.  


Many laboratories have similar stories to the above, lamenting the loss of knowledge within the overall organization due to poor knowledge management practices. Knowledge management has been a human activity for thousands of years since the first pictographs were placed on cave walls. The technology being used, the amount of knowledge generated, and our ability to work with it has changed over many centuries. Today, the subject of organizational knowledge management has seen evolving interest as organizations have moved from disparate archives and libraries of physical documents to more organized "computer-based organizational memories"<ref>{{Cite journal |last=Walsh |first=James P. |last2=Ungson |first2=Gerardo Rivera |date=1991-01 |title=Organizational Memory |url=http://www.jstor.org/stable/258607?origin=crossref |journal=The Academy of Management Review |volume=16 |issue=1 |pages=57 |doi=10.2307/258607}}</ref> where a higher level of productivity can be had.
Many laboratories have similar stories to the above, lamenting the loss of knowledge within the overall organization due to poor knowledge management practices. Knowledge management has been a human activity for thousands of years since the first pictographs were placed on cave walls. The technology being used, the amount of knowledge generated, and our ability to work with it has changed over many centuries. Today, the subject of organizational knowledge management has seen evolving interest as organizations have moved from disparate archives and libraries of physical documents to more organized "computer-based organizational memories"<ref name="WalshOrg91">{{Cite journal |last=Walsh |first=James P. |last2=Ungson |first2=Gerardo Rivera |date=1991-01 |title=Organizational Memory |url=http://www.jstor.org/stable/258607?origin=crossref |journal=The Academy of Management Review |volume=16 |issue=1 |pages=57 |doi=10.2307/258607}}</ref> where a higher level of productivity can be had.


Until recently, many approaches to organization memory development have relied on [[document management system]]s (DMSs) with keyword indexing and local search engines for retrieval. While those are a start, we need more; search engines still rely too heavily on people to sort through their output to find and organize relevant material.  
Walsh and Ungson define organizational memory as "stored information from an organization's history that can be brought to bear on present decisions," with that information being "stored as a consequence of implementing decisions to which they refer, by individual recollections, and through shared interpretations."<ref name="WalshOrg91" /> Until recently, many electronic approaches to OM development have relied on [[document management system]]s (DMSs) with keyword indexing and local search engines for retrieval. While those are a start, we need more; search engines still rely too heavily on people to sort through their output to find and organize relevant material.  


Recently—particularly in 2023—AI systems like the notable ChatGPT 3 have offered a means of searching, organizing, and presenting material in a form that requires little additional human effort. Initial versions have had several issues (e.g., "[[Hallucination (artificial intelligence)|hallucinations]],” a tame way of saying the AI fabricates and falsifies data and information<ref>{{Cite journal |last=Emsley |first=Robin |date=2023-08-19 |title=ChatGPT: these are not hallucinations – they’re fabrications and falsifications |url=https://www.nature.com/articles/s41537-023-00379-4 |journal=Schizophrenia |language=en |volume=9 |issue=1 |pages=52, s41537–023–00379-4 |doi=10.1038/s41537-023-00379-4 |issn=2754-6993 |pmc=PMC10439949 |pmid=37598184}}</ref>), but as new models and tools are developed to better address these issues<ref name="FabbroMicrosoft24">{{cite web |url=https://qz.com/microsoft-azure-ai-hallucinations-chatbots-1851374390 |title=Microsoft is apprehending AI hallucinations — and not just its own |author=Fabbro, R. |work=Quartz |date=29 March 2024 |accessdate=10 April 2024}}</ref><ref name="MaurinTheBank24">{{cite web |url=https://www.risk.net/risk-management/7959062/the-bank-quant-who-wants-to-stop-gen-ai-hallucinating |title=The bank quant who wants to stop gen AI hallucinating |author=Maurin, N. |work=Risk.net |date=15 March 2024 |accessdate=10 April 2024}}</ref>, sufficient improvement may be shown so that those AI systems eventually may deliver on their potential. Outside of ChatGPT, there are similar systems available (e.g., Microsoft CoPilot and Google Gemini), and more are likely under development. Our intent is not to make a comparison since any effort will quickly become outdated.   
Recently—particularly in 2023—AI systems like the notable ChatGPT<ref name="ChatGPT3.5">{{cite web |url=https://chat.openai.com/ |title=ChatGPT 3.5 |publisher=OpenAI OpCo, LLC |accessdate=10 April 2024}}</ref> have offered a means of searching, organizing, and presenting material in a form that requires little additional human effort. Initial versions have had several issues (e.g., "[[Hallucination (artificial intelligence)|hallucinations]],” a tame way of saying the AI fabricates and falsifies data and information<ref>{{Cite journal |last=Emsley |first=Robin |date=2023-08-19 |title=ChatGPT: these are not hallucinations – they’re fabrications and falsifications |url=https://www.nature.com/articles/s41537-023-00379-4 |journal=Schizophrenia |language=en |volume=9 |issue=1 |pages=52, s41537–023–00379-4 |doi=10.1038/s41537-023-00379-4 |issn=2754-6993 |pmc=PMC10439949 |pmid=37598184}}</ref>), but as new models and tools are developed to better address these issues<ref name="FabbroMicrosoft24">{{cite web |url=https://qz.com/microsoft-azure-ai-hallucinations-chatbots-1851374390 |title=Microsoft is apprehending AI hallucinations — and not just its own |author=Fabbro, R. |work=Quartz |date=29 March 2024 |accessdate=10 April 2024}}</ref><ref name="MaurinTheBank24">{{cite web |url=https://www.risk.net/risk-management/7959062/the-bank-quant-who-wants-to-stop-gen-ai-hallucinating |title=The bank quant who wants to stop gen AI hallucinating |author=Maurin, N. |work=Risk.net |date=15 March 2024 |accessdate=10 April 2024}}</ref>, sufficient improvement may be shown so that those AI systems eventually may deliver on their potential. Outside of ChatGPT, there are similar systems available (e.g., Microsoft CoPilot and Google Gemini), and more are likely under development. Our intent is not to make a comparison since any effort will quickly become outdated.   
 
===Why are organizational memory systems important?===
Research and development and supporting laboratory activities can be an expensive operation. ROI is one measure of the wisdom behind the investment in that work, which can be substantively affected by the [[Informatics (academic field)|informatics]] environment within the laboratory and the larger organization of which it is a part. We'll take a brief look at three approaches to OM systems: paper-based, electronic, and AI-driven systems.
 
'''1. Paper-based systems''': Paper-based systems pose a high risk of knowledge loss. While paper notebooks are in active use, the user knows the contents and can find material quickly. However, once the notebook is filled and put first in a library and then in an archive, the memory of what is in it fades. Once the original contributor leaves his post (due to promotion, transfers, or outside employment), you’re left depending on someone's recall or brute force searching to retrieve the contents. The cost of using that paper-based work and trying to gain benefit from it increases significantly, and the benefit is questionable depending on the ability of the information to be found, understood, and put to use. All of this assumes that the material hasn’t been damaged or lost. Paper-based lab notebooks create a knowledge bottleneck. Digital solutions are needed for secure, long-term storage and efficient searchability of experimental data.
 
'''2. Electronic systems and search engines''': Analytical and experimental reports, as well as other organizational documents, can be entered into a DMS with suitable keyword entries (i.e., [[metadata]])<ref name="DCAbout">{{cite web |url=https://www.dublincore.org/about/ |title=About DCMI |publisher=Association for Information Science and Technology |accessdate=10 April 2024}}</ref>, indexed, and searched via search engines local to the organization or lab. The problem with this approach is that you get a list of reference documents that must be reviewed manually to ferret out and organize relevant content, which is time-consuming and expensive. This work has to be prioritized along with other demands on people’s time. Suppose a LIMS—whether it's a true LIMS or LIMS-like spreadsheet implementation—or an SDMS is used. In that case, the search may not include material in these systems but may be limited to descriptions in reports. Until the advent of popularized AI in 2023, readily available capabilities faced limitations. Only organizations with substantial budgets and resources could independently pursue more comprehensive technologies. 
 
'''3. AI-driven systems''': Building upon electronic systems with query capability, we can use the stored documents to train and update an AI assistant (a special purpose variation of ChatGPT, Watsonx 5, or other AI, for example). Variations can be created that are limited to private material to provide data security, and later they may be extended to public documents on the internet with controls to avoid information leakage. Based on the material available to date and at least one user’s experience using ChatGPT v4, the results of a search question provided by the AI system were more comprehensive, better organized, and presented in a readable and useable fashion that made it immediately useful, instead of simply providing a starting point for further research work. One change noted from earlier AI models is a lower tendency to provide false references, and the references provided are seemingly more relevant, summarized, and accurate. (Note: Any information an AI provides should be checked for accuracy before use.) An additional benefit is that its incorporation becomes synergistic as more material is provided. Connecting an AI to a LIMS or SDMS would provide additional benefits. However, extreme care must be taken to prevent premature disclosure of results before they are signed off, and [[Information security|data security]] has to be a high priority.





Revision as of 18:18, 10 April 2024

Title: Organizational Memory and Laboratory Knowledge Management: Its Impact on Laboratory Information Flow and Electronic Notebooks

Author for citation: Joe Liscouski, with editorial modifications by Shawn Douglas

License for content: Creative Commons Attribution-ShareAlike 4.0 International

Publication date: April 2024

Introduction

Beginning in the 1960s, the application of computing to laboratory work was focused on productivity: the reduction of the amount of work and cost needed to generate results, along with an improvement in the return on investment (ROI). This was very much a bottom-up approach, addressing the most labor-intensive issues and moving progressively to higher levels of data and information processing and productivity.

The efforts began with work at Perkin-Elmer, Nelson Analytical, Spectra Physics, Digital Equipment Corporation, and many others on the computer controlled recording and processing of instrument data. Once we learned how to acquire the data, robotic tools were introduced to help process samples and make them ready for introduction into instruments, that with the connection to a computer for data acquisition further increased productivity. That was followed by an emphasis on the storage, management, and analysis of that data through the application of laboratory information management systems (LIMS) and other software. With the recent development of artificial intelligence (AI) systems and large language models (LLMs), we are ready to consider the next stage in automation and system’s application: organizational memory and laboratory knowledge management.

This piece discusses the convergence of a set of technologies and their application to scientific work. The development of software systems like ChatGPT, Gemini, and others[1] means that with a bit of effort the ROI in research and testing can be greatly improved.

The initial interest discussed herein is on the topic of using LLMs to create an effective organizational memory (OM) and how that OM can benefit scientific organizations. Following that, we'll then examine how that potential technology impacts information flow, integration, and productivity, as well as what it could mean for developing electronic laboratory notebooks (ELNs). We’ll also have to extend that discussion to having AI and OM systems work with LIMS, scientific data management systems (SDMS), instrument data systems (IDSs), engineering tools, and field work found in various industries.

This work is not a "how to acquire and implement" article but rather a prompt for "something to think about and pursue" if makes sense within your organization. The idea is the creation of an effective OM (i.e., an extensive document and information database) that fills a gap in scientific and laboratory informatics, one that can be used effectively with an AI tool to search, organize, synthesize, and present material in an immediately applicable way. We need to seriously think about what we want from these systems and what our requirements are for them before the rapid pace of development produces products that need extensive modifications to be useful in scientific, laboratory, field, and engineering work.

Why should you read this?

Most of the products used in scientific work (whether in the lab, field, office, etc.) are designed for a specific application (working with instruments, for example) or adapted from general-purpose tools used in various industries and settings. The ideas discussed here need further development, as do the tools specifically for the needs of the scientific community. Still, that work needs to begin as a community effort to gain possible benefits. We need to guide the development of technologies so that they meet the needs of the scientific community rather than try to adapt them once they are delivered to the general marketplace.

LLM systems have shown rapid development and deployment in almost every facet of industries throughout 2023. Unless something drastic happens, development will only accelerate, given the potential impact on business operations and interest of technology-driven companies. The scientific community needs to not only ensure its unique needs (once they’ve been defined) are included in LLM development and are met, but also that the resultant output reflects empirical rigor.

Organizational memory

A researcher came into our analytical lab and asked about some results reported a few years earlier. One chemist recalled the project as well as the person in charge of that work, who had since left the company. The researcher thought he had a better approach to the problem being studied in the original work and was asked to investigate it. The bad news is that all the work, both analytical and previous research notes, was written into paper laboratory notebooks (1960s). Because of their age, they had left the library and were stored in banker’s boxes in a trailer in the parking lot. There, they were subject to water damage and rodents. Most of that material was unusable, and the investigation was dropped.

Many laboratories have similar stories to the above, lamenting the loss of knowledge within the overall organization due to poor knowledge management practices. Knowledge management has been a human activity for thousands of years since the first pictographs were placed on cave walls. The technology being used, the amount of knowledge generated, and our ability to work with it has changed over many centuries. Today, the subject of organizational knowledge management has seen evolving interest as organizations have moved from disparate archives and libraries of physical documents to more organized "computer-based organizational memories"[2] where a higher level of productivity can be had.

Walsh and Ungson define organizational memory as "stored information from an organization's history that can be brought to bear on present decisions," with that information being "stored as a consequence of implementing decisions to which they refer, by individual recollections, and through shared interpretations."[2] Until recently, many electronic approaches to OM development have relied on document management systems (DMSs) with keyword indexing and local search engines for retrieval. While those are a start, we need more; search engines still rely too heavily on people to sort through their output to find and organize relevant material.

Recently—particularly in 2023—AI systems like the notable ChatGPT[3] have offered a means of searching, organizing, and presenting material in a form that requires little additional human effort. Initial versions have had several issues (e.g., "hallucinations,” a tame way of saying the AI fabricates and falsifies data and information[4]), but as new models and tools are developed to better address these issues[5][6], sufficient improvement may be shown so that those AI systems eventually may deliver on their potential. Outside of ChatGPT, there are similar systems available (e.g., Microsoft CoPilot and Google Gemini), and more are likely under development. Our intent is not to make a comparison since any effort will quickly become outdated.

Why are organizational memory systems important?

Research and development and supporting laboratory activities can be an expensive operation. ROI is one measure of the wisdom behind the investment in that work, which can be substantively affected by the informatics environment within the laboratory and the larger organization of which it is a part. We'll take a brief look at three approaches to OM systems: paper-based, electronic, and AI-driven systems.

1. Paper-based systems: Paper-based systems pose a high risk of knowledge loss. While paper notebooks are in active use, the user knows the contents and can find material quickly. However, once the notebook is filled and put first in a library and then in an archive, the memory of what is in it fades. Once the original contributor leaves his post (due to promotion, transfers, or outside employment), you’re left depending on someone's recall or brute force searching to retrieve the contents. The cost of using that paper-based work and trying to gain benefit from it increases significantly, and the benefit is questionable depending on the ability of the information to be found, understood, and put to use. All of this assumes that the material hasn’t been damaged or lost. Paper-based lab notebooks create a knowledge bottleneck. Digital solutions are needed for secure, long-term storage and efficient searchability of experimental data.

2. Electronic systems and search engines: Analytical and experimental reports, as well as other organizational documents, can be entered into a DMS with suitable keyword entries (i.e., metadata)[7], indexed, and searched via search engines local to the organization or lab. The problem with this approach is that you get a list of reference documents that must be reviewed manually to ferret out and organize relevant content, which is time-consuming and expensive. This work has to be prioritized along with other demands on people’s time. Suppose a LIMS—whether it's a true LIMS or LIMS-like spreadsheet implementation—or an SDMS is used. In that case, the search may not include material in these systems but may be limited to descriptions in reports. Until the advent of popularized AI in 2023, readily available capabilities faced limitations. Only organizations with substantial budgets and resources could independently pursue more comprehensive technologies.

3. AI-driven systems: Building upon electronic systems with query capability, we can use the stored documents to train and update an AI assistant (a special purpose variation of ChatGPT, Watsonx 5, or other AI, for example). Variations can be created that are limited to private material to provide data security, and later they may be extended to public documents on the internet with controls to avoid information leakage. Based on the material available to date and at least one user’s experience using ChatGPT v4, the results of a search question provided by the AI system were more comprehensive, better organized, and presented in a readable and useable fashion that made it immediately useful, instead of simply providing a starting point for further research work. One change noted from earlier AI models is a lower tendency to provide false references, and the references provided are seemingly more relevant, summarized, and accurate. (Note: Any information an AI provides should be checked for accuracy before use.) An additional benefit is that its incorporation becomes synergistic as more material is provided. Connecting an AI to a LIMS or SDMS would provide additional benefits. However, extreme care must be taken to prevent premature disclosure of results before they are signed off, and data security has to be a high priority.


Acknowledgements

I’d like to thank Gretchen Boria for her help in improving this article and her contributions to it.

Footnotes

About the author

Initially educated as a chemist, author Joe Liscouski (joe dot liscouski at gmail dot com) is an experienced laboratory automation/computing professional with over forty years of experience in the field, including the design and development of automation systems (both custom and commercial systems), LIMS, robotics and data interchange standards. He also consults on the use of computing in laboratory work. He has held symposia on validation and presented technical material and short courses on laboratory automation and computing in the U.S., Europe, and Japan. He has worked/consulted in pharmaceutical, biotech, polymer, medical, and government laboratories. His current work centers on working with companies to establish planning programs for lab systems, developing effective support groups, and helping people with the application of automation and information technologies in research and quality control environments.

References

  1. Malhotra, T. (30 January 2024). "This AI Paper Unveils the Future of MultiModal Large Language Models (MM-LLMs) – Understanding Their Evolution, Capabilities, and Impact on AI Research". Marktechpost. Marketechpost Media, LLC. https://www.marktechpost.com/2024/01/30/this-ai-paper-unveils-the-future-of-multimodal-large-language-models-mm-llms-understanding-their-evolution-capabilities-and-impact-on-ai-research/. Retrieved 10 April 2024. 
  2. 2.0 2.1 Walsh, James P.; Ungson, Gerardo Rivera (1 January 1991). "Organizational Memory". The Academy of Management Review 16 (1): 57. doi:10.2307/258607. http://www.jstor.org/stable/258607?origin=crossref. 
  3. "ChatGPT 3.5". OpenAI OpCo, LLC. https://chat.openai.com/. Retrieved 10 April 2024. 
  4. Emsley, Robin (19 August 2023). "ChatGPT: these are not hallucinations – they’re fabrications and falsifications" (in en). Schizophrenia 9 (1): 52, s41537–023–00379-4. doi:10.1038/s41537-023-00379-4. ISSN 2754-6993. PMC PMC10439949. PMID 37598184. https://www.nature.com/articles/s41537-023-00379-4. 
  5. Fabbro, R. (29 March 2024). "Microsoft is apprehending AI hallucinations — and not just its own". Quartz. https://qz.com/microsoft-azure-ai-hallucinations-chatbots-1851374390. Retrieved 10 April 2024. 
  6. Maurin, N. (15 March 2024). "The bank quant who wants to stop gen AI hallucinating". Risk.net. https://www.risk.net/risk-management/7959062/the-bank-quant-who-wants-to-stop-gen-ai-hallucinating. Retrieved 10 April 2024. 
  7. "About DCMI". Association for Information Science and Technology. https://www.dublincore.org/about/. Retrieved 10 April 2024.