Difference between revisions of "Grid computing"

From LIMSWiki
Jump to navigationJump to search
(Created as needed. Saving and continuing to edit the content.)
 
m (Cat)
 
(5 intermediate revisions by 2 users not shown)
Line 15: Line 15:
Additionally, differences in programming and deployment exist. It can be costly and difficult to write programs that can run in the environment of a supercomputer, which may have a custom operating system or require the program to address concurrency issues.<ref name="GrahamSuper">{{cite book |url=http://books.google.com/books?id=Llhr49iPJNIC&pg=PA134 |title=Getting Up To Speed: The Future of Supercomputing |author=Graham, Susan L.; Snir, Marc; Patterson, Cynthia A. |publisher=The National Academies Press |year=2005 |pages=134–135 |isbn=9780309165518 |accessdate=17 September 2014}}</ref> If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs — given a different part of the same problem — to run on multiple machines. This makes it possible to write and debug on a single conventional machine, and it eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.
Additionally, differences in programming and deployment exist. It can be costly and difficult to write programs that can run in the environment of a supercomputer, which may have a custom operating system or require the program to address concurrency issues.<ref name="GrahamSuper">{{cite book |url=http://books.google.com/books?id=Llhr49iPJNIC&pg=PA134 |title=Getting Up To Speed: The Future of Supercomputing |author=Graham, Susan L.; Snir, Marc; Patterson, Cynthia A. |publisher=The National Academies Press |year=2005 |pages=134–135 |isbn=9780309165518 |accessdate=17 September 2014}}</ref> If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs — given a different part of the same problem — to run on multiple machines. This makes it possible to write and debug on a single conventional machine, and it eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.


==Design considerations and variations==
==Design considerations==
One feature of distributed grids is that they can be formed from computing resources belonging to multiple individuals or organizations (known as multiple [[administrative domain]]s). This can facilitate commercial transactions, as in [[utility computing]], or make it easier to assemble [[volunteer computing]] networks.
One feature of distributed grids is that they can be formed from computing resources belonging to multiple administrative domains. This can facilitate commercial transactions, as in utility computing, or make it easier to assemble volunteer computing networks. However, the computers which are actually performing the calculations might not be entirely trustworthy, requiring additional security measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results. Authentication, authorization, and encryption methods must all be employed to ensure "the integrity and confidentiality of the data processed within the grid."<ref name="JacobIBMGrid">{{cite book |url=http://www.redbooks.ibm.com/redbooks/pdfs/sg246778.pdf |format=PDF |title=Introduction to Grid Computing |author=Jacob, Bart; Brown, Michael; Fukui, Kentaro; Trivedi, Nihar |publisher=IBM |year=December 2005 |pages=248 |accessdate=17 September 2014}}</ref>


One disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. This often involves assigning work randomly to different nodes (presumably with different owners) and checking that at least two different nodes report the same answer for a given work unit. Discrepancies would identify malfunctioning and malicious nodes.
The impacts of trust and availability on performance and development can influence the choice of whether to deploy onto a dedicated cluster, to idle machines internal to the developing organization, or to open an external network of volunteers or contractors.<ref name="JacobIBMGrid" /> In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust "client" nodes must place in the central system such as placing applications in virtual machines.


Due to the lack of central control over the hardware, there is no way to guarantee that [[Node (computer science)|nodes]] will not drop out of the network at random times. Some nodes (like laptops or [[dialup]] Internet customers) may also be available for computation but not network communications for unpredictable periods. These variations can be accommodated by assigning large work units (thus reducing the need for continuous network connectivity) and reassigning work units when a given node fails to report its results in expected time.
Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures.<ref name="JacobIBMGrid" /> With many languages, there is a trade off between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Cross-platform languages can reduce the need to make this trade off, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform).<ref name="PlaszczakGrid">{{cite book |url=http://books.google.com/books?id=ZyEoEn0_ITIC&pg=PA38&lpg=PA38 |title=Grid Computing: The Savvy Manager's Guide |author=Plaszczak, Pawel; Wellner, Jr., Richard |publisher=Elsevier |year=2005 |pages=38–39 |isbn=9780080470764 |accessdate=17 September 2014}}</ref>
 
The impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated cluster, to idle machines internal to the developing organization, or to an open external network of volunteers or contractors. In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust “client” nodes must place in the central system such as placing applications in virtual machines.
 
Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on [[heterogeneous]] systems, using different [[operating systems]] and [[computer architecture|hardware architectures]]. With many languages, there is a trade off between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). [[Cross-platform]] languages can reduce the need to make this trade off, though potentially at the expense of high performance on any given [[Node (computer science)|node]] (due to run-time interpretation or lack of optimization for the particular platform). There are diverse scientific and commercial projects to harness a particular associated grid or for the purpose of setting up new grids. [[BOINC]] is a common one for various academic projects seeking public volunteers; more are listed at the [[Grid computing#See also|end of the article]].
 
In fact, the middleware can be seen as a layer between the hardware and the software. On top of the middleware, a number of technical areas have to be considered, and these may or may not be middleware independent. Example areas include [[Service level agreement|SLA]] management, Trust and Security, Virtual organization management, License Management, Portals and Data Management. These technical areas may be taken care of in a commercial solution, though the cutting edge of each area is often found within specific research projects examining the field.
 
===CPU scavenging===
'''CPU-scavenging''', '''cycle-scavenging''', or '''shared computing''' creates a “grid” from the unused resources in a network of participants (whether worldwide or internal to an organization). Typically this technique uses desktop computer [[instruction cycle]]s that would otherwise be wasted at night, during lunch, or even in the scattered seconds throughout the day when the computer is waiting for user input or slow devices. In practice, participating computers also donate some supporting amount of disk storage space, RAM, and network bandwidth, in addition to raw CPU power.{{Citation needed |date= July 2013}}
 
Many [[volunteer computing]] projects, such as [[BOINC]], use the CPU scavenging model. Since [[Node (computer science)|nodes]] are likely to go "offline" from time to time, as their owners use their resources for their primary purpose, this model must be designed to handle such contingencies.<ref>Kamran Karimi, Neil G. Dickson, and Firas Hamze, High-Performance Physics Simulations Using Multi-Core CPUs and GPGPUs in a Volunteer Computing Context, International Journal of High Performance Computing Applications, 2011</ref>


==Projects and applications==
==Projects and applications==


Grid computing offers a way to solve [[Grand Challenge problem]]s such as [[protein folding]], financial [[model (abstract)|modeling]], [[earthquake]] simulation, and [[climate]]/[[weather]] modeling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a [[utility computing|utility]] for commercial and noncommercial clients, with those clients paying only for what they use, as with electricity or water.
Grid computing offers a way to solve large-scale problems such as protein folding, financial modeling, and geographic and meteorological simulations. Grids offer a way of using the information technology resources optimally inside an organization. Examples include:


Grid computing is being applied by the National Science Foundation's National Technology Grid, NASA's Information Power Grid, Pratt & Whitney, Bristol-Myers Squibb Co., and American Express.{{Citation needed|date=February 2007}}
* {{As of|September 2014}} the open-source Berkeley Open Infrastructure for Network Computing (BOINC) platform was being used by over 3.2 million users. Of those users, 1.5 million users were attached to SETI@home, a project dedicated to detecting intelligent life beyond Earth, achieving an average of 1,969 TeraFLOPS (floating point operations per second).<ref name="SETIFLOPS">{{cite web |url=http://www.allprojectstats.com/po.php?projekt=15 |title=Project statistics - SETI@home |work=All Project Stats |publisher=BOINC |accessdate=17 September 2014}}</ref>


One cycle-scavenging network is [[SETI@home]], which was using more than 3 million computers to achieve 23.37 sustained [[FLOPS|teraflops]] (979 lifetime teraflops) {{As of|2001|alt=as of September 2001}}.<ref>[http://setiathome.ssl.berkeley.edu/totals.html ]{{dead link|date=July 2010}}</ref>
* {{As of|September 2014}}, Folding@home, a project dedicated to disease research, achieves 39,990 TeraFLOPS from over 172,000 users.<ref name="FHStats">{{cite web |url=http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats2 |title=Folding@home Client statistics by OS |work=Folding@home |publisher=Pande Lab, Stanford University |accessdate=17 September 2014}}</ref>


As of August 2009 [[Folding@home]] achieves more than 4 petaflops on over 350,000 machines.
* {{As of|September 2014}}, the Worldwide LHC Computing Grid — dedicated to handling nearly 30 petabytes of data per year from the Large Hadron Collider — involves 40 countries, 170 computing centers, and two million jobs run every day, making it one of the largest grid computing projects ever.<ref name="WLCG">{{cite web |url=http://wlcg-public.web.cern.ch/ |title=Welcome - Worldwide LHC Computing Grid |publisher=CERN |accessdate=17 September 2014}}</ref>


The [[European Union]] funded projects through the [[framework programme]]s of the [[European Commission]]. [[BEinGRID]] (Business Experiments in Grid) was a research project funded by the European Commission<ref>[http://www.beingrid.eu/ Home page of BEinGRID]</ref> as an [[Integrated Project (EU)|Integrated Project]] under the [[Sixth Framework Programme]] (FP6) sponsorship program. Started on June 1, 2006, the project ran 42 months, until November 2009. The project was coordinated by [[Atos Origin]]. According to the project fact sheet, their mission is “to establish effective routes to foster the adoption of grid computing across the EU and to stimulate research into innovative business models using Grid technologies”. To extract best practice and common themes from the experimental implementations, two groups of consultants are analyzing a series of pilots, one technical, one business. The project is significant not only for its long duration, but also for its budget, which at 24.8 million Euros, is the largest of any FP6 integrated project. Of this, 15.7 million is provided by the European commission and the remainder by its 98 contributing partner companies. Since the end of the project, the results of BEinGRID have been taken up and carried forward by [http://www.it-tude.com IT-Tude.com].
==Further reading==


The Enabling Grids for E-sciencE project, based in the [[European Union]] and included sites in Asia and the United States, was a follow-up project to the European DataGrid (EDG) and evoled into the [[European Grid Infrastructure]]. This, along with the [[LHC Computing Grid]]<ref>[http://lcg.web.cern.ch/LCG/ Large Hadron Collider Computing Grid official homepage]</ref> (LCG), was developed to support experiments using the [[CERN]] [[Large Hadron Collider]]. The  A list of active sites participating within LCG can be found online<ref>{{cite web|url=http://goc.grid.sinica.edu.tw/gstat/ |title=GStat 2.0&nbsp;– Summary View&nbsp;– GRID EGEE |publisher=Goc.grid.sinica.edu.tw |accessdate=July 29, 2010}}</ref> as can real time monitoring of the EGEE infrastructure.<ref>{{cite web|url=http://gridportal.hep.ph.ic.ac.uk/rtm/ |title=Real Time Monitor |publisher=Gridportal.hep.ph.ic.ac.uk |accessdate=July 29, 2010}}</ref> The relevant software and documentation is also publicly accessible.<ref>{{cite web|url=http://lcg.web.cern.ch/LCG/activities/deployment.html |title=LCG&nbsp;– Deployment |publisher=Lcg.web.cern.ch |accessdate=July 29, 2010}}</ref> There is speculation that dedicated fiber optic links, such as those installed by CERN to address the LCG's data-intensive needs, may one day be available to home users thereby providing internet services at speeds up to 10,000 times faster than a traditional broadband connection.<ref>[http://www.timesonline.co.uk/tol/news/science/article3689881.ece "Coming soon: superfast internet"]</ref>
* {{cite book |url=http://www.redbooks.ibm.com/redbooks/pdfs/sg246778.pdf |format=PDF |title=Introduction to Grid Computing |author=Jacob, Bart; Brown, Michael; Fukui, Kentaro; Trivedi, Nihar |publisher=IBM |year=December 2005 |pages=248}}
 
The [[distributed.net]] project was started in 1997.
The [[NASA Advanced Supercomputing facility]] (NAS) ran [[genetic algorithm]]s using the [[Condor cycle scavenger]] running on about 350 [[Sun Microsystems]] and [[Silicon Graphics|SGI]] workstations.
 
In 2001, [[United Devices]] operated the [[United Devices Cancer Research Project]] based on its [[Grid MP]] product, which cycle-scavenges on volunteer PCs connected to the Internet. The project ran on about 3.1 million machines before its close in 2007.<ref>[http://www.grid.org/stats/ ]{{dead link|date=July 2010}}</ref>
 
As of 2011, over 6.2 million machines running the open-source [[Berkeley Open Infrastructure for Network Computing]] (BOINC) platform are members of the [[World Community Grid]], which tops the processing power of the current fastest supercomputer system (China's [[Tianhe-I]]).<ref>[http://boincstats.com BOINCstats]</ref>


==See also==
==See also==
Line 62: Line 44:
==References==
==References==
<references />
<references />
<!---Place all category tags here-->
[[Category:Software and hardware terms]]

Latest revision as of 15:17, 20 September 2022

Grid computing is the use of a shared "infrastructure that bonds and unifies globally remote and diverse [computing] resources"[1] for the purposes of completing one or more computational tasks that would otherwise require significantly more time if performed on any single machine. Grid computing is a form of distributed computing whereby a virtual supercomputer is formed from many loosely coupled computers acting together to perform large tasks. For certain applications, grid computing can be seen as a special type of parallel computing that relies on complete computers (with onboard CPUs, storage, power supplies, network interfaces, etc.) connected to a network (private, public, or the Internet) via a conventional network interface, such as Ethernet.[1] This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus.

One of the main strategies of grid computing is to use middleware to divide and apportion pieces of a program among several computers, sometimes up to many thousands. Grid computing involves computation in a distributed fashion, which may also involve the aggregation of large-scale clusters, which may vary in size from a small group of workstations confined to a network within a corporation (intra-node cooperation) to large public collaborations across many companies and networks (inter-node cooperation).

History

The idea grid computing originated in the 1990s as a metaphor for making computer power as easy to access as an electric power grid. Where parallel computing and supercomputers were primarily used in the '80s and '90s, grid computing began to take shape as an option by the mid-1990s. In 1995, the Information-Wide Area Year (I-WAY) project was initiated, dedicated to the integration of other existing high-bandwidth networks and the management of software run over them. This project stood out as one of the first major milestones towards true grid computing.[2][3] Not long afterwards, CPU scavenging and volunteer computing projects like distributed.net in 1997[4] and SETI@home in 1999[5] began to harness the power of networked PCs worldwide to solve CPU-intensive research problems.

Grid computing was further refined with Ian Foster and Carl Kesselman's widely regarded 1998 work The Grid: Blueprint for a New Computing Infrastructure, in which they set out to define and extend the concepts surrounding the idea.[6] Along with software developer Steven Tuecke, the trio previously lent their expertise to the I-WAY project, particularly through their own Globus Project, which would link sites into "virtual organization" for scientific collaboration.[7] The group would not only further refine the definition in 2002, but they would also lead the release of the Globus Toolkit, an open-source toolkit for grid computing.[7]

In 2007 the term cloud computing came into popularity, which is conceptually similar to the canonical Foster definition of grid computing (in terms of computing resources being consumed as electricity is from the power grid). Indeed, grid computing is often (but not always) associated with the delivery of cloud computing systems.[8]

Grid computing vs. supercomputing

While supercomputing is essentially parallel computing[9], grid computing is a special type of parallel computing that relies on complete computers connected to a network. The primary performance disadvantage of grid computing to supercomputing is that the various processors and local storage areas typically do not have high-speed connections. This arrangement is thus well-suited to applications in which multiple parallel computations can take place independently, without the need to communicate intermediate results between processors.[10]

Additionally, differences in programming and deployment exist. It can be costly and difficult to write programs that can run in the environment of a supercomputer, which may have a custom operating system or require the program to address concurrency issues.[11] If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs — given a different part of the same problem — to run on multiple machines. This makes it possible to write and debug on a single conventional machine, and it eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.

Design considerations

One feature of distributed grids is that they can be formed from computing resources belonging to multiple administrative domains. This can facilitate commercial transactions, as in utility computing, or make it easier to assemble volunteer computing networks. However, the computers which are actually performing the calculations might not be entirely trustworthy, requiring additional security measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results. Authentication, authorization, and encryption methods must all be employed to ensure "the integrity and confidentiality of the data processed within the grid."[12]

The impacts of trust and availability on performance and development can influence the choice of whether to deploy onto a dedicated cluster, to idle machines internal to the developing organization, or to open an external network of volunteers or contractors.[12] In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust "client" nodes must place in the central system such as placing applications in virtual machines.

Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures.[12] With many languages, there is a trade off between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Cross-platform languages can reduce the need to make this trade off, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform).[13]

Projects and applications

Grid computing offers a way to solve large-scale problems such as protein folding, financial modeling, and geographic and meteorological simulations. Grids offer a way of using the information technology resources optimally inside an organization. Examples include:

  • As of September 2014 the open-source Berkeley Open Infrastructure for Network Computing (BOINC) platform was being used by over 3.2 million users. Of those users, 1.5 million users were attached to SETI@home, a project dedicated to detecting intelligent life beyond Earth, achieving an average of 1,969 TeraFLOPS (floating point operations per second).[14]
  • As of September 2014, Folding@home, a project dedicated to disease research, achieves 39,990 TeraFLOPS from over 172,000 users.[15]
  • As of September 2014, the Worldwide LHC Computing Grid — dedicated to handling nearly 30 petabytes of data per year from the Large Hadron Collider — involves 40 countries, 170 computing centers, and two million jobs run every day, making it one of the largest grid computing projects ever.[16]

Further reading


See also

Notes

This article reuses numerous content elements from the Wikipedia article.

References

  1. 1.0 1.1 Magoulès, Frédéric (2009). Fundamentals of Grid Computing: Theory, Algorithms and Technologies. CRC Press. pp. 322. ISBN 9781439803684. http://books.google.com/books?id=Dei9vpdQiHcC&pg=PA1. Retrieved 17 September 2014. 
  2. Berman, Fran; Fox, Geoffrey; Hey, Tony (2003). "Chapter 1: The Grid: Past, Present, and Future". Grid Computing: Making the Global Infrastructure a Reality. John Wiley and Sons. pp. 9–50. ISBN 9780470853191. http://books.google.com/books?id=b4LWXLRBRLsC&pg=PA3. Retrieved 17 September 2014. 
  3. National Science and Technology Council's Committee on Computing, Information, and Communications (November 1996). High Performance Computing and Communications: Advancing the Frontiers of Information Technology. National Science and Technology Council. https://www.nitrd.gov/pubs/bluebooks/1997/cover-front.html. Retrieved 17 September 2014. 
  4. "distributed.net History & Timeline". distributed.net. http://www.distributed.net/history. Retrieved 17 September 2014. 
  5. "About SETI@home". SETI@home. University of California. http://setiathome.berkeley.edu/sah_about.php. Retrieved 17 September 2014. 
  6. Foster, Ian (20 July 2002). "What is the Grid? A Three Point Checklist". http://dlib.cs.odu.edu/WhatIsTheGrid.pdf. Retrieved 17 September 2014. 
  7. 7.0 7.1 Braverman, Amy M. (April 2004). "Father of the Grid". The University of Chicago Magazine 96 (4). http://magazine.uchicago.edu/0404/features/index.shtml. Retrieved 17 September 2014. 
  8. Myerson, Judith M. (3 March 2009). "Cloud computing versus grid computing". IBM developerWorks. IBM. http://www.ibm.com/developerworks/library/wa-cloudgrid/. Retrieved 17 September 2014. 
  9. Lafferty, Eduard L.; Michaud, Marion C.; Prelle, Myra Jean; Goethert, Joan B. (1993). Parallel Computing: An Introduction. Noyes Data Corporation. pp. 146. ISBN 9781437744934. http://books.google.com/books?id=Xb2GAAAAQBAJ&pg=PA1. Retrieved 17 September 2014. 
  10. "Computational problems". GridCafé. e-ScienceTalk. http://www.e-sciencecity.org/EN/gridcafe/computational-problems.html. Retrieved 17 September 2014. 
  11. Graham, Susan L.; Snir, Marc; Patterson, Cynthia A. (2005). Getting Up To Speed: The Future of Supercomputing. The National Academies Press. pp. 134–135. ISBN 9780309165518. http://books.google.com/books?id=Llhr49iPJNIC&pg=PA134. Retrieved 17 September 2014. 
  12. 12.0 12.1 12.2 Jacob, Bart; Brown, Michael; Fukui, Kentaro; Trivedi, Nihar (December 2005) (PDF). Introduction to Grid Computing. IBM. pp. 248. http://www.redbooks.ibm.com/redbooks/pdfs/sg246778.pdf. Retrieved 17 September 2014. 
  13. Plaszczak, Pawel; Wellner, Jr., Richard (2005). Grid Computing: The Savvy Manager's Guide. Elsevier. pp. 38–39. ISBN 9780080470764. http://books.google.com/books?id=ZyEoEn0_ITIC&pg=PA38&lpg=PA38. Retrieved 17 September 2014. 
  14. "Project statistics - SETI@home". All Project Stats. BOINC. http://www.allprojectstats.com/po.php?projekt=15. Retrieved 17 September 2014. 
  15. "Folding@home Client statistics by OS". Folding@home. Pande Lab, Stanford University. http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats2. Retrieved 17 September 2014. 
  16. "Welcome - Worldwide LHC Computing Grid". CERN. http://wlcg-public.web.cern.ch/. Retrieved 17 September 2014.