LII:Medical Device Software Development with Continuous Integration/Validation

From LIMSWiki
Jump to navigationJump to search
-----Return to the beginning of this guide-----

Overview of unit tests

In computer programming, unit testing is a method by which individual units of source code are tested to determine if they are fit for use. A unit is the smallest testable part of an application. In procedural programming a unit may be an individual function or procedure. In object-oriented programming a unit is usually a method. Unit tests are created by programmers or occasionally by white box testers during the development process. In the world of Java, we have a number of popular options for the implementation of unit tests, with JUnit and TestNG being, arguably, the most popular. Examples provided in this article will use TestNG syntax and annotations.

Traditionally (and by traditionally, I mean in their relatively brief history), unit tests have been thought of as very simple tests to validate basic inputs and outputs of a software method. While this can be true, and such simple tests can serve of some amount of value, it is possible to achieve much more with unit tests. In fact, it is not only possible but recommended that we implement much of our user acceptance, functional, and possibly even some non-functional tests within a unit test framework.

To further enhance quality, we can augment the acceptance with unit tests.[1] While I personally have never been a fan of test-driven development (I feel that the assumptions required by test-driven development do not allow for a true iterative approach), I do believe that creation of unit tests in parallel with development leads to much more quality software. In the world of Agile, this means that no functional requirement (or user story) is considered fully implemented without a corresponding unit test. This strict view of unit tests may be a bit extreme, but it is not without merit.

The first unit test a developer may ever write is likely so simple that it's nearly useless. It may go something like this.

Given a method:

public int doSomething (int a, int b) {        return c;}

A simple unit test may look something like this:

public class MyUnitTests {    @Test    public void testDoSomething() {        assertEquals(doSomething(1, 2), expectedResult);    }}

Given a very simple method the developer is able to assert that, essentially, a + b = c. This is easy to write, and there is little overheard involved, but it really isn’t a very useful unit test.

Early attempts to automate functional testing

Long ago I was involved with a project in which management invested a significant amount of time and training in an attempt to implement automated testing. The chosen tool was Rational Robot (now an IBM product). The idea behind tools such as Robot was that a test creator could record test macros, note points of verification, and replay the macros later with test results noted. Tools such as Rational Robot and WinRunner attempted to replace the human tester with record scripts. These automated scripts could be written using a scripting language or, more commonly, by recording mouse movements, clicks, and keyboard actions. In this regard, these tools of test automation allowed black-box testing through a user interface.

In this over-simplified view of automated testing, there were simply too many logistical problems with test implementation to make them practical. Any minor changes to the user interface would result in a broken test script. Those responsible for maintaining these automated scripts often found themselves spending more time maintaining the tests than using them for actual application testing.

Rational Robot and tools like it are alive and well, but I refer to them in the past tense because such tools, in my experience, have proven themselves to be a failure. I say this because I have personally spent significant amounts of time creating automated scripts in such tools, and I have been frustrated to learn later that they would not be used because of the substantial amount of interface code that changes as a project progresses. Such changes are absolutely expected, and yet, a recorded automated test does not lend itself well to an iterative development environment or an ongoing project.

Automating functional tests using unit test framework

Most software projects, especially in any kind of Agile environment, undergo frequent changes and refactoring. If the traditional single-flow waterfall model worked, recorded test scripts such as those noted previously would probably work just fine as well, albeit with little benefit.

But it should be well known by know that the traditional single-flow waterfall model has failed, and we live in an iterative/Agile world. As such, our automated tests must be equally equipped for ongoing change. And because the functional unit tests are closely related to requirements at both a white-box and black-box level, developers, not testers, have an integral role in the creation of automated tests.

To achieve this level of unit testing, a test framework must be in place. This requires a bit of up-front effort, and the details of creating such a framework go well beyond the scope of this article. Additionally, the needs of a test framework will vary depending on the project.

Test fixtures become an important part of complex functional unit testing. A test fixture is a class that incorporates all of the setup necessary for running such unit tests. It provides methods that can create common objects (for example, test servers and mock interfaces). The details included in a test fixture are specific to each project, but some common methods include test setup, simulation, and mock object creation and destruction, as well as declaration of any common functionality to be used across unit tests. To provide further detail on test fixture creation would require much more detail than can be provided here.

Given what may seem like extreme overhead in the creation of complex unit tests, we may begin to question the value. There is, no doubt, a significant up-front cost to the creation of a versatile and useful unit test framework (including a test fixture, which includes all the necessary objects and setup needed to simulate a running environment for the sake of testing). And given the fact that manual function and user acceptance testing remains a project necessity, it seems like there may be an overlap of effort.

But this is not the case.

With a little up-front creation of a solid unit test framework, we can make efforts to create unit tests simple. We can even go as far as requiring a unit test for any functional requirement implementation prior to allowing that requirement (or ticket) to be considered complete. Furthermore, as we discover potential functionality problems, we have the opportunity to introduce a new test right then and there! The hardware system, software program, and general quality assurance system controls discussed below are essential in the automated manufacture of medical devices. The systematic validation of software and associated equipment will assure compliance with the QS regulation and reduce confusion, increase employee morale, reduce costs, and improve quality. Further, proper validation will smooth the integration of automated production and quality assurance equipment into manufacturing operations. Medical devices and the manufacturing processes used to produce them vary from the simple to the very complex. Thus, the QS regulation needs to be and is a flexible quality system. This flexibility is increasingly valuable as more device manufacturers move to automated production, test/inspection, and record-keeping systems.[2]

What is a good unit test?

In his book Safe and Sound Software, Thomas H. Faris describes the unit test as such:

Software testing may occur on software modules or units as they are completed. Unit testing is effective for testing units as they are completed, when other units are components have not yet been completed. Testing still remains to be completed to ensure that the application will work as intended when all software units or components are executed together.[3]

This is a start, but unit tests can achieve so much more! Faris goes on to describe a number of different categories of software[3]:

  • Black box test
  • Unit test
  • Integration test
  • System test
  • Load test
  • Regression test
  • Requirements-based test
  • Code-based test
  • Risk-based test
  • Clinical test

Traditionally this may be considered a fair list. Used wisely, and with the proper frame work, however, we can perform black box, integration, system, load, regression, requirements, code-based, risk-based, and clinical tests with efficient unit tests that simulate a true production environment. The purpose of this article is not to go into the technical details of how (to explain unit test frameworks, fixtures, mock objects and simulations would require much more space). Rather, I simply want to point out the benefits that result. To achieve these benefits, your software team will need to develop a deep understand of unit tests. It will take some time, but it will be time very well spent.

It’s a good idea to have unit tests that go above and beyond what we traditionally think of as unit tests, and go several steps further, automating functional testing. This is another one of those areas where team members often (incorrectly) feel that there is not sufficient time to do all the work. As Harris goes on to state:

Software testing and defect resolution are very time-consuming, often draining more than one-half of all effort undertaken by a software organization ... Testing need not wait until the entire product is completed; iteratively designed and developed code may be tested as each iteration of code is completed. Prior to beginning of verification or validation, the project plan or other test plan document should discuss the overall strategy, including types of tests to be performed, specific functional tests to be performed, and a designation of test objectives to determine when the product is sufficiently prepared for release and distribution.[3]

Harris is touching on something that is very important in our FDA-regulated environment, and this is the fact that we must document and describe our tests. For our unit tests to be useful we must provide documentation of what each test does (that is, what specifically it is testing) and what the results are. The beauty of unit tests and the tools available (incorporation into our continuous integration environment) is that this process is streamlined in a way that makes the traceability and re-creation of test conditions required for our 510(k) extremely easy!

To achieve all of this we will need to have a testing framework capable of application launch, simulations, mock objects, mock interfaces and temporary data persistence. This all sounds like much more overhead than it actually is, but fear not: the benefits far outweigh the costs.

What is the value of unit testing?

Immediate feedback within continuous integration: Developer confidence

Too often we view testing as an activity that occurs only at specific times during software development. At worst, software testing takes place upon completion of development (which is when it is learned that development is nowhere near complete). In other more zealous environments, it may take place at the end of each iteration. We can do better! How about complex unit tests performing validation continuously, with each code change? It is possible to perform full regression tests with every single code change. It sounds like a significant amount of overhead, but it is not. The real cost to a project is not inattention to complex functional unit tests; the danger is that we put off testing until it is too late to react to a critical issue discovered during some predetermined testing phase.

The most effective way of killing a project is to organize it so that testing becomes an activity that is so critical to its success that we do not allow for the possibility that testing can do what it is supposed to do: discover a defect prior to go-live.

At its most basic level, a continuous integration build environment does just one thing: it runs whatever scripts we tell it to. To that end, it is important that the CI build execute unit tests and that a failure of any single unit test is considered a failure of the continuous integration build. The power of a tool such as Jenkins is that we can tell it to run whatever we want, log the outcome, keep build artifacts, run third-party evaluation tools, and report on results. With integration of our software version control system (e.g., Subversion, Git, Mercurial, CVS, etc.), we know the changeset relevant to a particular build. It can be configured to generate a build at whatever interval we want (nightly, hourly, every time there is a code commit, etc.). When a test fails, we know immediately what changeset was involved.

Personally, every time I do any code commit of significance, one of the first things I do is check the CI build for success. If I've broken the build, I get to work on correcting the problem (and if I cannot correct the problem quickly, I roll my changeset out so that the CI build continues to work until I’ve fixed the issue).

Easy refactoring

As a developer, refactoring can be a scary thing. Refactoring is perhaps the most effective way of introducing a serious defect while doing something that seems innocuous. With thorough unit tests performing a full regression test with each and every committed software changeset, however, a developer can have confidence that his or her simple code changes have not introduced a defect. We have continuous integration builds running our tests for many reasons, not the least of which is to alert developers to the possibility that their changes have broken the build.

As a developer I strive to avoid breaking the continuous integration build. When I do break it, however, I am very pleased to know that what was done to cause a problem has been discovered immediately. Correction of a defect becomes much more costly when its discovery is not noticed until the end of a development phase!

Regression tests with every code change

By "repeated" I mean something different than repeatable. The fundamental benefit with repeated tests is the fact that a test can be executed many more times by automation than by a human tester. Sometimes, even without a related code change, and much to our surprise, we see a test suddenly fail where it succeeded numerous times before. What happened?

The most difficult software defects to fix (much less, find) are the ones that do not happen consistently. Database locking issues, memory issues, deadlock bugs, memory leaks, and race conditions can result in such defects. These defects are serious, but if we never detect them, how can we fix them?

As stated previously, it is imperative that have unit tests that go above and beyond what we traditionally think of as unit tests, going several steps further, automating functional testing). This is another one of those areas where team members often (incorrectly) feel that there is not sufficient time to deal with the creation of unit tests. Given a proper framework, however, creation of unit tests need not be overwhelming.

Another occasional issue has to do with misuse of the software version control system. Many developers know the frustration that can come with an accidental code change resulting from one developer stepping over the modifications of another. While this is a rare issue in a properly used version control environment, it does still happen, and unit tests can quickly reveal such a problem at build time.

Concurrency tests

Concurrency tests are tricky, and it is in concurrency testing that the repeated and rapid nature of functional unit tests can shine where human testers cannot. I personally have witnessed many occasions in which a CI build suddenly fails for no obvious reason. There was no code commit related to the particular point-of-failure, and yet a unit test that once succeeded suddenly fails? Why?

This can happen (and it does happen) because concurrency problems, by their very nature, are hit or miss. Sometimes they are so unlikely to occur that we never witness them during the course of normal testing. When a continuous integration environment runs concurrency tests dozens of times a day, however, we increase the likelihood of finding a hidden and menacing problem. Additionally, unit tests can simulate many concurrent users and processes in a way that even a team of human testers cannot.

Repeatable and traceable test results

This is the key to making our unit tests adhere to the standards we have set forth in our quality system so that we may use them as a part of our submission (see the following section on Regulated Environment Needs). If we are going to put forth the effort, and since we already know that unit tests result in a quality improvement to our software, why wouldn’t we want to include these test results?

Our continuous integration server can and should be used to store our unit test results right alongside each and every build that it performs.

This is a benefit not only in the world of an FDA-regulated environment, of course. In any software project it can be difficult to recreate conditions under which a defect was discovered. With a CI build executing our build and test scripts under a known environment with a known set of files (the CI build tool pulls from the version control system), it is possible to execute the tests under exact and specific circumstances.

Many of the benefits of functional unit testing listed above are gained only when unit tests are written alongside design and development (test-driven methodologies aside). It is imperative that the development team develop and observe test results while design and activities take place. This is of benefit to the quality assurance team as well, as Dean Leffingwell notes:

A comprehensive unit test strategy prevents QA and test personnel from spending most of their time finding and reporting on code-level bugs and allows the team to move its focus to more system-level testing challenges. Indeed, for many agile teams, the addition of a comprehensive unit test strategy is a key pivot point in their move toward true agility — and one that delivers "best bang for the buck" in determining overall system quality.[4] Also, it is probably becoming clear that a key benefit of functional unit tests is the real-time feedback offered to the development team. Humble and Farley refer to the unit tests that are executed with each software change as "commit tests."[5]

Commit tests that run against every check-in provide us with timely feedback on problems with the latest build and on bugs in our application in the small.[5] Project unit tests, which should offer significant amount coverage (at least 80 percent), provide the team with built-in software change-commit acceptance criteria. If a developer causes the CI build to fail because of a code change, it is immediately known that the change involved does not meet minimum accepted criteria, and it requires urgent attention.

Humble and Farley continue:

Crucially, the development team must respond immediately to accepted test breakages that occur as part of the normal development process. They must decided if the breakage is a result of a regression that has been introduced, an intentional change in the behavior of the application, or a problem with the test. Then they must take appropriate action to get the automated acceptance test suite passing again.[5]

Regulated environment needs

Per 21 CFR Part 820.30 on design controls:

(f) Design verification. Each manufacturer shall establish and maintain procedures for verifying the device design. Design verification shall confirm that the design output meets the design input requirements. The results of the design verification, including identification of the design, method(s), the date, and the individual(s) performing the verification, shall be documented in the design history file (DHF).[6]

Simply put, our functional unit tests must be a part of our DHF, and we must document each test and test result (success or failure) as well as tie tests and outcomes to specific software releases. This is made extremely easy with a continuous integration environment in which builds and build outcomes (including test results) are stored on a server, labeled, and linked to from our DHF. Indeed, what is sometimes a tedious task when it comes to manual execution and documentation of test results becomes quite convenient.

The same is true of design validation:

(g) Design validation. Each manufacturer shall establish and maintain procedures for validating the device design. Design validation shall be performed under defined operating conditions on initial production units, lots, or batches, or their equivalents. Design validation shall ensure that devices conform to defined user needs and intended uses and shall include testing of production units under actual or simulated use conditions. Design validation shall include software validation and risk analysis, where appropriate. The results of the design validation, including identification of the design, method(s), the date, and the individual(s) performing the validation, shall be documented in the DHF.[6]

Because our CI environment packages build and test conditions at a given point in time, we can successfully satisfy the requirements laid out by 21 CFR Part 820.30 (f) and (g) with very little effort. We simply allow our CI environment to do that which is does best, and that which a human tester may spend many hours attempting to do with accuracy.

Document the approach

As discussed, all these tests are indeed very helpful to the creation of good software. However, without a wise approach to incorporation of such tests in our FDA-regulated environment, they are of little use in any auditable capacity. It is necessary to document our approach to unit test usage and documentation within our standard operating procedures (SOPs) and work instructions, and this is to be documented in much the same way that we would document any manual verification and validation test activities.

To this end, it is necessary to make our unit tests and their outputs an integral part of our DHF. Each test must be traceable, and this means that unit tests are given unique identifiers. These unique identifiers are very easily assigned using an approach in which we organize tests in logical units (for example, by functional area) and label tests sequentially.

Label and trace tests

An approach that I have taken in the past is to assign some high-level numeric identifier and a secondary sub-identifier that is used for the specific test. For example, we may have the following functional areas: user session, audit log, data input, data output, and web user interface tests (these are very generic examples of functional areas, granted). Given such functional areas, I would label each test using test naming annotations, with the following high level identifiers:

  • 1000: user session tests
  • 2000: audit log tests
  • 3000: data input tests
  • 4000: data output tests
  • 5000: web user interface tests

Within each test it is then necessary to go a step further, applying some sequential identifier to each test. For example, the user test package may include tests for functional requirements such as user login, user logout, session expiration, and a multiple-user login concurrency test. In such a scenario, we would label the tests as follows:

  • 1000_010: user login
  • 1000_020: user logout
  • 1000_030: session expiration
  • 1000_040: multiple concurrent user login

Using TestNG syntax, along with proper Javadoc comments, it is very easy to label and describe a test such that inclusion in our DHF is indeed very simple.

/** * Test basic user login and session creation with a valid user. * * @throws Exception */@Test(dependsOnMethods = {"testActivePatientIntegrationDisabled"},      groups = {"TS0005_AUTO_VE1023"})public void testActivePatientIntegrationEnabled() throws Exception {    Fixture myApp new Fixture();    UserSession mySession = fixture.login(test_user, test_password);    assertNotNull(mySession);    asertTrue(mySession.active());}

Any numbering we choose to use for these tests is fine, as long as we document our approach to test labeling in some project level document, for example a validation plan or master test plan. Such decisions are left to those who design and apply a quality system for the FDA-regulated project. As most of us know by now, the FDA doesn’t tell us exactly how we are to do things; rather, we are simply told that we must create a good quality system, trace our requirements through design, incorporate the history in our DHF, and recreate build and test conditions.

If I make this all sound a little too easy, it is because I believe it is easy. Too often we view cGMP guidance as a terrible hindrance to productivity, but we are in control of making things as efficient as we can.

The traceability matrix

A critical factor in making unit tests usable in an auditable manner is incorporating them into the traceability matrix. As with any test, requirements, design elements, and hazards must be traced to one another through use of the traceability matrix.

The project team must document traceability of requirements through specification and testing to ensure that all requirements have been tested and correctly implemented (product requirements traceability matrix).[3]

With each automated test labeled, we can use the built-in JUnit or TestNG funcationality (along with XSLT, if we so choose) to create output that is tied to the build number and changset and traceable within our trace matrix. The output of our tests (which are run during each continuous integration build) may be as follows:

TEST NAME     STATUSTS0005_AUTO_VE1022     PASSTS0005_AUTO_VE1023     PASS     TS0005_AUTO_VE1024     FAILTS0005_AUTO_VE1025     SKIP

Naturally, we hope that all the automated tests pass, but when they fail we need to record the failure. Its my opinion that placing all test outcomes in the DHF is not necessary. Rather, the DHF can point to the continuous integration build server, where automated test results are bundled alongside each build. Finally, at the end of a sprint or iteration, the appropriate test results for the final locked down build are captured in the DHF and traced appropriately per SOPs.

Our SOPs and work instructions will require that we prove traceability of our tests and test results, whether manual or automated unit tests. Just as has always been done with the manual tests that we are familiar with, tests must be traced to software requirements, design specifications, hazards, and risks. The goal is simply to prove that we have tested that which we have designed and implemented, and in the case of automated tests, this is all very easy to achieve!

Do we still need manual tests?

Yes! Absolutely! There are a number of reasons why manual tests are still, and always will be, required. Take for example installation qualification and environmental tests. Both manual and automated tests are valid and valuable, and neither should be considered a replacement for the other.

I recall being a child in karate lessons. One day I came home from a lesson, very proud that I had learned to block a punch. "Come at me with a punch," I said to my friend.

Doing what I asked, he punched me right in the chest, and I failed to block the punch. This punch wasn't thrown the way I expected (the way we practiced in karate lessons).

"No, no, no! You’re punching me the wrong way!" I said. I only knew how to block one kind of punch, and when punched a different way, my block no longer worked. To me, this karate lesson highlights the difference between an exception and an error. Automated tests can provide error test coverage very well. But when thrown something unanticipated, they don’t offer the creativity in and of themselves to find the issue.

It is up to us, developers and testers, to come up with creative punches to throw at our system. This is where manual testing allows a certain amount of "creative" punching that may not be considered during unit test development. Manual tests also lead to greater insight related to usability and user interaction issues.

Perhaps even more importantly, manual tests give feedback on general application usability and user interaction. To this end, a defect that is discovered during manual testing should result in an automated test.

Notes

The original author had anticipated writing about the following sub-topics but never did: test fixture, mock objects, avoiding the Singleton design pattern, in-memory DB, and in-memory servlet container.

References

  1. Leffingwell, D. (2011). Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley Professional. p. 61. ISBN 9780321635846. 
  2. "General Principles of Software Validation; Final Guidance for Industry and FDA Staff". Food and Drug Administration. 11 January 2002. http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm085281.htm. Retrieved 27 April 2016. 
  3. 3.0 3.1 3.2 3.3 Faris, T.H. (2006). Safe and Sound Software: Creating an Efficient and Effective Quality System for Software Medical Device Organizations. ASQ Quality Press. p. 118–123. ISBN 0873896742. 
  4. Leffingwell, D. (2011). Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley Professional. p. 196. ISBN 9780321635846. 
  5. 5.0 5.1 5.2 Humble, J.; Farley, D. (2010). Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley Professional. p. 124. ISBN 9780321601912. 
  6. 6.0 6.1 "Title 21--Food and Drugs, Part 820--Quality System Regulation, Sec. 820.30 Design controls". CFR - Code of Federal Regulations Title 21. Food and Drug Administration. 21 August 2015. https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfCFR/CFRSearch.cfm?fr=820.30. Retrieved 27 April 2016.