TDD And A New Paradigm For Hardware Verification

We’ve looked at why teams should consider doing TDD; we’ve looked at how the roles and responsibilities change between design and verification experts; now let’s look at when everything happens. This is the key to seeing how everything fits together.

Most teams nowadays chose to split their verification effort up as a combination of block level and top level testing. Block level testing is usually applied to every subsystem in a design and intended to exhaustively cover all the features of each subsystem. Once the block level testing is done, the exhaustively tested pieces are integrated and tested in a top level testbench.

The need for block level testing is justified by the fact that the arms-length control and visibility normally available in the top level testbench is inadequate or inconvenient for testing the first and second order functionality we talked about in the last post. The need for the top level testing is obvious as well because the block level testing does nothing to verify the interdependencies between subsystems and the operation of the system as a whole.

Makes sense… block level testing to verify the details; top-level testing to verify the integration. Here’s a picture of that as an timeline with the design, block level and top-level testing.

The designers start writing RTL. Verification engineers start building a block level testbench at the same time (or shortly thereafter as I’ve shown here). When the RTL and the block level testbench are done, the verification team will start running tests and collecting coverage results. When the block level testing is done, the block level design is signed off and it’s ready for top level testing.

The top level testbench, in the meantime, is usually started after the block level design and verification is well underway. The top level tests get run when most the block level designs near sign off. At the conclusion of top-level testing, the entire design is signed off and the design goes to the fab (or the FPGA goes to a customer, or the IP goes to another team or whatever). You’re done.

Now let’s look at the problems that normally pop-up and the reasons for why things never seem to work out as planned.

  1. The RTL is NOT done: I’ve harped on this before so I’m not going to spend much time on it here other than to say that I would challenge anyone that says the RTL Done milestone we all race toward is at all meaningful. You may have a bunch of code, but it’s not tested and it’s not done (see By Example: Done vs. DONE).
  2. The testbench is NOT done: same comment as above. Testbenches aren’t DONE when we say they’re done either. That goes for block level and top level (again… see By Example: Done vs. DONE).
  3. You can’t plan your way through integration testing: here’s the big one and the real reason you’re here today. I really believe I’ve seen top level testing planned as good as anyone could reasonably plan. Well partitioned design: check. Well defined block level interfaces: check. Standardized model interfaces: check. Uniform, reusable block level environments: check. Customer initiated use cases: check. Skilled team: check. The ducks were all lined up and still there were the same old issues that tend to blow apart a schedule. A plan is one thing; practice is usually something else.

Take those and other issues into account and what you get in practice is more likely to resemble this…

RTL done and testbench done mean you have a bunch of buggy design and testbench code that doesn’t really work. After saying they’re done, experts from both domains transition into a firefighting role as soon as the first few tests uncover major quality issues (the firefighting is shown as red extensions to the original estimates). While that’s going on, the person in charge of the top level testing finds out that everything doesn’t actually just fit together as planned. Because people are firefighting at the block level and because the block level is given priority, the top level people have to make due until help finally arrives; if it arrives. A lack of help means the top level testing drags on, eventually stumbling across the finish line days, weeks or months later than expected. By the time the design is finally released, people are exhausted from weeks of stress and sleep deprivation.

I used to be of the opinion that teams could plan their way through top level verification. Now I’m sure they can’t.

I think the difference between planned and actual release dates is caused by false confidence and ambiguity in our development process. I think we can overcome both with a new paradigm for hardware development that emphasizes early validation and prioritizes top level development. Here’s a simple diagram of this new paradigm…

The first thing to point out is the alternating test-design cycles at the unit level. This is TDD. That’s meant to clean up localized/unit level bugs in the design before they’re committed to the database. Notice also that the unit level design starts with a purple bar. Test first!!

Next major point is that there is no delay to top level testing; it starts immediately. As portions of the design are unit tested, they are also verified at the top level. Top level tests are where we measure progress. When the top level tests pass, then – and only then – is the code considered DONE.

A third note here is reserved for the role of block level testing. In this new paradigm, block level testing is no longer automatic because TDD is used to prevent many of the first and second order bugs and integration testing happens much sooner. While there may still be a need for block level tests to, for example, isolate certain complex features that are hard to isolate at the top level, exhaustive testing of all blocks within a design would no longer be the norm.

We’ll still do block level testing but it’s no longer “on” by default.

What’s required to keep everything together in this new paradigm for functional verification? One word…

TEAMWORK

The top level effectively takes priority in the testing, which means all the development at the unit and block levels must be synchronized such that it feeds the top level effort successfully. Everyone needs to know what is going on and how there work supports the combined effort of top level testing. That takes teamwork, plain and simple.

-neil

Q. What problems do you see during integration testing? Does unit testing and/or early integration testing address those problems?

15 thoughts on “TDD And A New Paradigm For Hardware Verification

  1. My son is a s/w developer ( showing my age here ). He uses TDD – and always has. He said “one of the key things about TDD is having fast tests”. He said he was working on reducing the time his tests took. I asked how long they took. He said “33 seconds”. I asked what he thought was reasonable. “6 seconds” he replied.

    I said “we have recently achieved a big success. We got our lockdown tests down to one hour”. He asked why our tests took so long. So I started explaining about constrained randomization. He looked at me with that look that tech savvy kids give their parents. “Why don’t you just test what you want to test ?” he asked.

    So … why don’t we just test what we want to test ? Does constrained random contradict TDD in theory ? [ How can you write the minimum code needed to get a test to pass when you don’t know what your test is testing ? ] Does it contradict TDD in practice ? [ Tests must be quick, constrained random is by definition slow ].

    1. Awesome! I don’t get those comments from my kid yet (he’s only 4) but I already know they’re coming… just a matter of time!

      So far, I think we want a little bit of both. Test what we want to test for the rapid feedback but also test what we don’t know with longer constrained random (exploratory) cycles. We definitely want to start with the test what we want to test though and that’s something a lot of teams *aren’t* doing. I’m not sure constrained random necessarily contradicts tdd theory provided there’s more emphasis on the “constrained” and less emphasis on the “random”.

      Another thing that I think what your son is suggesting implicitly is that we hardware folks have come to see long (long, long) simulation times as acceptable. I don’t like getting into conversations about tools when I talk about agile because I think people start narrowing in on solutions instead of keeping the big picture in mind, but if people find value in the rapid feedback cycles and start using techniques like TDD, we’re probably going to need a minor tool revolution to enable them. Example: I just farmed out a ‘$display(“hello world”)’ that took 20sec. It’s hard for us to do anything in 6sec.

      neil

  2. “So … why don’t we just test what we want to test ? Does constrained random contradict TDD in theory ?”

    Don’t want to speak about the TDD (honestly, I still don’t completely understand this term). In my opinion – it just contradicts with natural testing process.
    CR OO-based verification philosophy is all about smart constraint-random testbench development. Commonly-used marketing slogan is : invest couple of months into your testbench, and then just click to run a regression, tune, report bugs, click again and so on. In theory – this process looks attractive. In practice – such Testbench development process goes out-of sync with natural test development sequence: start with basic directed sanity tests, add other “corner-case” directed tests, add some randomization and so on. It requires jumping right into constraint-random testing phase, with quite complicated and, in many cases, untested by itself Testbench.

    In my opinion, recent verification methodologies developed a gap before designers and verification engineers. Verifiers rose their level of abstraction, dealing with alternative to design object hierarchies, reusable for the whole world (and therefore, complex) base classes and so on. But designers just remained with their RTL coding. In most cases, they are unable to work with RTL and at the same time climb to the level of abstraction verification engineers deal with. Then, they have to write hundred of untested code lines waiting for the “magic OO-CR testbenches” provided by the verification engineer.

    Now few words about Agile. The basic principle of homeopathy, known as the “law of similars”, is “let like be cured by like.”
    Looks like another influence from the software world is applied to cure verification community from the previous software world influences.
    Hardware and software world are not the same, and the process of applying any external-to-hardware world methods may be harmful – especially for people without good knowledge in software design.
    I would rather use common sense, thinking and dealing with terms of our industry.

    -Alex

    1. alex, thanks for the comment. I agree with everything right up to…

      Looks like another influence from the software world is applied to cure verification community from the previous software world influences

      With agile, we’re not just curing verification, we’re curing everything that’s wrong with hardware development.

      (sarcasm… in case anyone missed it 🙂 )

      I should probably add a post to this effect because it’s an important note that supersedes our tdd posts… I think agile is a combination of additional perspective and new practices. To some hardware folks, the additional perspective is radical. To others, it makes perfect sense. Some agile practices are truly different, some are a more refined version of what we already do. Some practices we already use. Regardless of perspective or practice, however, I’d never suggest a wholesale replacement for what we already do in hardware. We already do a decent job. There is room for improvement though and agile brings potential for improvement. It’s largely complementary to practices we already use (ie. tdd). In some cases it’d mean rethinking/rejigging/refining practices (ie. constrained random). And yes, agile *could* help us replace some of what we do (ie. swap our cubicles for a shared workspace).

      Of course any technique used improperly can bring harm to a development team (your example of constrained random test benches not working out as advertised is a good one, that I’m assuming originates from within hardware) but I’m not sure that has anything to do with where that technique comes from. I’m certainly an advocate of thinking things through and adequately critiquing any technique before using it. And after that, if it’s decided a particular technique makes sense (tdd, for example), then you use it, regardless of origin. If not, then you don’t use it. We’re definitely not trying to cure anything with agile or undo any previous damage done by previously software practices gone bad, just point out that there are other techniques and perspective out there that are potentially worth trying. Some will work. Some won’t.

      thanks for taking the time to comment and thanks for reading! You’re bringing up great points that need to be discussed.

      neil

      1. I agree that block/sub-block verification is important. More than that – I am doing it whenever it makes sense. However, there are some problems associated with this approach, such as:

        1. Metrics.
        ———–
        In my opinion, any time complexity accumulates – it has to be immediately verified. In some cases, half-page code may be much more “tricky” than multi-page one. Then, verification has to be applied to such code. However, any verification work takes time from designer, and it has to be part of overall planning effort. So the managers may be interested to know:
        -Are there any objective metrics for code complexity? In other words, how many block/sub-block testbenches have to be developed for given design.
        -How much time needed for each sub-block/bock verification.
        Personally, I feel it is difficult to communicate with management on this topic.

        2. Reuse.
        ———
        Many companies run VMM/OVM/UVM/Specman-based verification. To maximize reuse, any subblock/block etc verification has to be done with VMM/OVM/UVM. Is it practical to use complex constraint-random verification methodologies for the sub-block testing with simple directed tests?

        3. Regressions.
        –=———–
        Regression maintain functional stability of design over time. Is there a need to make block/sub-block testbenches regressionable as well? If yes, they have to contain automated checking, test status generation, and so on. Then, “agile” verification looses it’s agility – testbenches has to comply to certain rules, run scripts and so on.

        Would like to get any practical recommendations on these topics 😉

        1. Alex,

          If “complexity needs to be verified as it accumulates” means “code needs to be verified as it’s written” then I think we’re on the same page. As for metrics, though, I’m not sure we’re going in the same direction. The only metric I’ve seen for writing code is measuring actual time taken versus an estimate made by the person writing the code. The estimates are usually way off and of little value, especially because they can’t adequately predict debug effort. If you want an idea of how agile teams tackle progress metrics, find a few articles that talk about user stories and story points. For a book reference, I’ve read “agile estimating and planning” by mike cohn and would highly recommend it (https://www.amazon.ca/Agile-Estimating-Planning-Mike-Cohn/dp/0131479415/ref=sr_1_4?ie=UTF8&qid=1321504910&sr=8-4)

          As for reuse, common sense has to apply. If maximizing reuse actually means maximizing red tape and minimizing efficiency, then that’s not common sense. Personally, I’m so used to using vmm that quickly bringing up a testbench feels automatic to me now so I’d use vmm for simple or complicated tests. If it were another of the XXM’s, then I’d hesitate and weigh my options. That’s common sense to me but that’ll be different for everyone. Unfortunately, I think planning for reuse brings false hope that everything will turn out in the end. Even teams with an absolutely solid usage model and complete/shared understanding of testbench architecture are going to have problems with reuse. That’s why I suggest the early/incremental validation of top level by integrating code early and measuring progress at top level. I do that regardless of what XXM framework you use (or don’t use).

          As for your comments on regressions, I may have mislead you somewhere if it seems I suggested automated checking and complying to rules and scripts, etc means losing agility. The opposite is true. Automated testing procedures and tools are core to agile because the tests validate progress. I’m not really familiar with automated test tools agile teams use, but I can point you to cucumber (http://cukes.info/) as an example of one *very* widely used automated test framework. THere are others, but cucumber seems be very popular right now from what I understand.

          neil

          1. Neil,
            Thank you for your answers.
            You wrote:

            If “complexity needs to be verified as it accumulates” means “code needs to be verified as it’s written” then I think we’re on the same page.

            Complexity of code does not directly correlate with the amount of code lines. I doubt that every piece of code that is written has to be verified. Also, designers do re-use of previously verified code patterns, and there is probably little sense to do verification again and again.

            Also, some pieces of design code require not just “agile” verification, but thorough verification using combination of directed, random and formal methods. Think about an arbiter, controlling an access to the common resource. Correct functionality of this piece of logic with just dozens of flops is critical for the system.
            Applying just “software” methods to such logic verification is clearly not enough.

            Thank you for the link to Cucumber. It is interesting, but requires an introduction of YAL (Yet Another Language).
            Speaking about what is needed to make tests regressionable… Just compress all test results into the pass/fail status bit and let regression script understand it. For directed tests – it is just comparison between expected and actual resutls. For random ones – there is a need to develop testbench infrastructure – monitors, scoreboards, checkers, connect all of them into one error reporting system and so on. Then, we introduce certain testbench development methodology, script support and so on. Do you consider such cases as part of “Agile”? Or, is Agile all about quick directed testing with expected results?

            Regards,
            -Alex

          2. Alex,

            I still get the feeling that you think I’m suggesting we replace our hardware testing practices with software practices. That’s not where I’m going. Pertinent to this discussion, TDD in the software world has been a valuable tool for bug prevention. I think it could be equally valuable in hardware development when used to complement the practices we already follow. Next, I believe that adopting TDD will change the way we think about practices we already follow, namely block level testing, which I talk about in the original post. Exactly how they change I don’t know… I can only speculate for now. Further, TDD isn’t the only testing that happens on agile teams because, I’m assuming, that wouldn’t be enough for software testing either. Nor is agile just about quick directed testing. Don’t confuse the tight feedback loops you get from TDD with quick (aka: sloppy, inadequate) work. Tight feedback loops just mean you get results faster, not that the results are simpler or any less rigorous. Finally, there are no requirements that agile development be devoid of monitors, scoreboards, testbench methodology, a single pass/fail status bit and/or all the other things we depend on currently. Perhaps we rethink *why or how* we use all this stuff (ie. thinking in terms of incremental development has made me wonder why we find waiting so long for initial results from a constrained random testbench acceptable). Ultimately, we should be doing what makes sense which means there’s certainly no requirement to unlearn what we’ve learned and start from scratch.

            Thanks again for the challenging questions! I appreciate it!

            neil

        2. Alex,

          Comment on:

          “Then, “agile” verification looses it’s agility – testbenches has to comply to certain rules, run scripts and so on.”

          I do most of my designs for FPGAs using VHDL. Since I haven’t found any unit test frameworks for VHDL I decided to develop one of my own. One of my main philosophies when doing this was to enable fully TDD design with automated and potentially complex testbenches while still making quick and “simple” testing easy.

          The workflow goes something like this:

          1. First I use a code generator to generate the unit test framework for the design I want to test. If I’m using TDD the initial design is just an interface and if I’m coding “old style” I have a complete design to start with. The most noticeable with the generated testbench is that it’s structured around test cases. If you don’t want to use that concept you just put all your test code within a single test case.

          2. Start your simulator, compile your code and run your simulation. Verify the design using waveforms or whatever method you like. The code generator also generates a compile script which checks dependencies of the code an compile what’s neccessary in every coding/testing iteration. This is a speedup if you’re used to a compile everything strategy. This is all you do if you’re testing without automation in mind. Otherwise…

          3. Add assertions to your test code unless you did that before. The assertions comes with the framework and enable it to track the success/failure of the test cases, generate test reports etc.

          4. The testbench is now automated and you can run a simulation script generated by the code generator to start the automated test.

          5. The generated simulation script can be called from a continuous integration (CI) tool to “schedule” a test run of the latest code checked out from your SCM tool. I use the open source Hudson/Jenkins for CI and the test reports generated by my simulations integrate nicely into these tools to enable easy follow up/statistics on bugs etc.

          Now to your concern. Is this fast? Well, I did a test starting with the simplest completed design I can image – an inverter. From that I generated by testbench framework, added an automated test case, ran the simulation script and ended up with a test report within 50 seconds out of which at least 10 seconds belongs to starting up the simulator. I think this is at least as fast as any traditional approach even if you don’t use TDD and only make use of the automation

          1. Lars,

            Thanks for sharing your experience.
            I agree that investment into your own verification methodologies and infrastructure makes things much faster. However, it makes it faster personally for you, and if you’ll be able to market is successfully within your company – for your company. That’s it.

            Is it possible to develop common methodology, or everybody will remain with his/her own in-house developments? That’s a question.
            And if we’ll try to accommodate block-level methodology to suite the needs of the whole universe, we may end up to rise complexity to the level of UVM…

            And yet another concern. Under the constant pressure of “standard” verification methodologies, is there a way to develop alternative commonly-used verification methodology, at least for the block-levels? Do the “big guys” embrace such competition? Potentially, it may contradict with their claims of providing methodology that is truly “Universal”.

            Regards,
            -Alex

          2. lars, thanks for sharing some details from your tdd flow and framework. great to have hardware folks with tdd experience chime in.

            neil

          3. Alex,

            “However, it makes it faster personally for you, and if you’ll be able to market is successfully within your company – for your company. That’s it.”

            So you do agree it is a good way of doing things…

            But you do touch a very important question. If we accept TDD but must rely on the “big guys” to make it happen than we’re stuck unless they feel they want to provide the solutions and we can afford it. Most FPGA using companies I know can’t even afford a simulator license for SystemVerilog.

            The software world is very different. If I want a unit testing framework I can download JUnit for Java, CPPUnit for C++, CUnit for C,… and a bunch of other good and more or less complex tools for free. So I don’t think the complexity is the problem. If the community thinks something is good it will provide the solutions if it have the skills to do so. The latter is the problem for most hardware guys and hence we see very few good widely used open source tools in the HW community.

            The tools I have are used by my company and the companies we work for (we’re consultants) and we could probably make it available to everyone as open source. But that would only be interesting if I thought the hardware community would contribute to make them better. So yes, this is a problem.

  3. I’m a designer who’s mainly worked at small companies/startups where the distinction between design and verfification is never as clear cut as it seems to be in larger organisations. I guess I’ve being doing a kind of TDD for years, without actually being aware of it as a methodology. I’m glad to have my methods validated, however I do have one question regarding how PD fits into Agile./TDD

    Agile seems to be about incrementally delivering functionality, and about having that functionality be correct at each incremental delivery. However for PD to start it doesn’t really matter if the functionality is correct, but it does require code which is representative of the final design. Won’t PD be less meaningful if it’s started with a code base that only implements a subset of the final functionality? Sure there are things that can start before the RTL is frozen, but it’s not clear to me that if we have 50% of the functionality implemented and tested that we can concurrently have 50% of the PD “DONE” too.

    Perhaps this doesn’t matter if we find fewer functional problems (thanks to TDD) later in the design flow?

    1. Rick, thanks for reading and thanks for the comments.

      It’s going to take some thought and experience to see how incremental development and pd fit together. I’ve seen situations where functionality takes priority over pd which has lead to big problems with pd (late problems with architecture/timing/etc). I’ve also seen situations where pd takes priority over functionality and that’s lead to similarly big problems (a huge pile of buggy code that’s almost impossible to debug). I’ve got very little experience with pd so I can’t do much too much more than speculate to initiate the discussion… but it seems to me that the right approach is somewhere in between where we incrementally role out functionality while supplementing that functionality with memory/gate estimates, skeleton architectures, and whatever else the pd team can use to make progress. Tangible incremental goals for the pd should ultimately be part of the discussion also but me suggesting what those would be is way beyond my expertise.

      It’d definitely help to have a few pd experts take an interest to turn the speculation into more meaningful conversation.

      For your last comment, I think having fewer functional bugs will definitely mitigate risk for both verification and pd. it doesn’t take a very big “oopsy” to cause headaches downstream so the fewer of those we have, the better.

      thanks again for the comments!

      neil

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.