Improving Product Robustness 101
Improving product robustness is straightforward and difficult. Here’s how to do it.
Identify specific failure modes, prioritize them, and go after the biggest ones first. Failure modes can be identified through multiple sources. Warranty data is sometimes coded by failure mode (more precisely, symptom type), so start there. The number one failure mode in this type of data is typically “no problem found”, so be ready for it. Analysis of the actual products that come back is another good way. Returned product is routed to the appropriate engineer who analyzes it and enters the failure mode into a database. A formal design FMEA generates a list of prioritized failure modes through the risk priority number (RPN), where larger is more important. To do this, engineers are hauled into a room and a facilitator helps them come up with potential failure modes. One caution – the process can generate many failure modes, more than you can fix, so make the top five or ten go away and don’t argue the bottom fifty. It makes no sense to even talk about number eleven if you haven’t fixed the top ten. But the best way I have found to identify failure modes (problems) that are meaningful to the customer is to ask the technical services group for their top five things to fix. They will give you the right answer because they interact daily with customers who have broken product. They won’t expect you to listen to them (you never listened before), so surprise them by fixing one or two on their list. They will be grateful you listened (they’ll likely want to buy you coffee for the rest of your career) and your customers will notice.
Once failure modes are identified, define the physics of failure – why the product breaks. This is tough work and requires focused thought and analysis. If, when you break the product, it “looks like” the ones coming back from the field, you have defined the physics of failure. This is the same thing as replicating the problem in the lab. Once that’s defined, create an automated test rig or experimental setup that breaks the product in a way that captures the physics of failure. I call this test rig a robustness surrogate because it stands in for the actual failure mode seen in the field. The robustness surrogate should break the product as fast as possible while retaining the physics of failure so you can break it and fix it many times before product launch. The robustness surrogate should be designed to break the product within minutes, not hours or days – the faster the better.
To know if product robustness is improved, the baseline (or existing) design is broken on the robustness surrogate. The new design must survive longer on the robustness surrogate than the baseline design. The result is A/B data (baseline design/ new design) that is presented at the design review using a simple bar graph format which I call big-bar-little-bar. Keep improving robustness of the new design even if it outperforms the baseline design by a factor of ten – that’s not good enough for your customers.
Don’t stop improving robustness until you run out of time, and don’t stop if you meet the arbitrary MTBF specification. Customers like improved robustness, and in this case too much of a good thing is wonderful.
Using this method, I reduced warranty cost per unit by 75% over a five year period. It worked.
Thank you for this. No doubt these techniques can be useful and certainly better than much of the current practice.
But why wait to reduce warranty costs until the losses start occurring? An appropriate application of systems engineering, http://www.incose.org, up front can preclude most of the warranty claims (and the not as visible but much more pernicious badmouthing). Further, FMEA, properly done, focuses on system integrity balancing, a more insightful technique than risk-oriented analysis. Even more powerful is to consider not only failures but all causes of less than optimal performance throughout all conditions of operation. Some people now refer to this as resiliency rather than robustness, http://www.incose.org/practice/techactivities/wg/rswg/
Anyway, keep up the good work and strive to become continually better.
Thanks, Jack. Your comment adds much needed substance to the arguement. Mike
This is a sensible approach to achieving improved QRD.
I especially like your thinking re “robustness surrogates”. One form of “broken” product is unacceptable degraded performance, which I think is assumed in your article. I expect my car’s accessories will get noisier at high mileage, but if they degrade too fast and I’m within the warranty period, I’m going to take it in for repair, even if the intended function remains intact (I still have full power steering assist, for example). You can apply your same robustness surrogate thinking to degraded performance as well.
In these cases you’re not producing a “broken” product, but you still need to proceed cautiously in reproducing the field failure. One approach is degradation analysis (http://www.weibull.com/LifeDataWeb/degradation_analysis.htm). Say you’ve determined a threshold of acceptable degraded performance (perhaps via the Quality Loss Function) and know the stresses and cycles that would simulate 1X life. Your objective is to simulate the field failure as quickly as possible. As you say, you want to produce the failure in minutes or hours. If you understand the physics of failure (or degradation), carefully select experimental accelerating stress factors and levels (you want to hatch the chick faster… no hard-boiled eggs!), and apply an appropriate life model to the degradation test data, you can EXTRAPOLATE when actual 1X (or higher) life occurs. You would validate your degradation analysis by continuing to degrade some parts until a complete duplication of the field failure is realized (confirmed via a method such as Design Review Based on Test Results – DRBTR). Now, you’re home free. You won’t have to precisely duplicate field failure during optimization because the degradation analysis gave you good anticipatory ability, allowing you to efficiently verify design modifications targeted to improve QRD.
Hope this adds some value to the conversation…