An experimental study shows that developers often misperceive which software testing techniques are most effective for them. While techniques differ in actual defectAn experimental study shows that developers often misperceive which software testing techniques are most effective for them. While techniques differ in actual defect

Study Finds Software Testers Often Misjudge Which Techniques Work Best

2025/12/16 05:00

Abstract

1 Introduction

2 Original Study: Research Questions and Methodology

3 Original Study: Validity Threats

4 Original Study: Results

5 Replicated Study: Research Questions and Methodology

6 Replicated Study: Validity Threats

7 Replicated Study: Results

8 Discussion

9 Related Work

10 Conclusions And References

\

4 Original Study: Results

Of the 32 students participating in the experiment, nine did not complete the questionnaire11 and were removed from the analysis. Table 9 shows the balance of the experiment before and after participants submitted the questionnaire. We can see that G6 is the most affected group, with 4 missing people.

Appendix B shows the analysis of the experiment. The results show that program and technique are statistically significant (and therefore are influencing effectiveness), while group and the technique by program interaction are not significant. As regards the techniques, EP shows a higher effectiveness, followed by BT and then by CR. These results are interesting, as all techniques are able to detect all defects. Additionally, more defects are found in ntree compared to cmdline and nametbl, where the same amount of defects are found.

\ Note that ntree is the program applied the first day, has the highest Halstead metrics, and it is not the smallest program or the one with lowest complexity. These results suggest that:

– There is no maturation effect. The program where highest effectiveness is obtained is the one used the first day.

– There is no interaction with selections effect. Group is not significant.

– Mortality does not affect experimental results. The analysis technique used (Linear Mixed-Effects Models) is robust to lack of balance.

– Order of training could be affecting results. The highest effectiveness is obtained in the last technique taught, while the lowest effectiveness is obtained in the first technique taught. This suggests that techniques taught last are more effective than techniques taught first. This could be due to participants remembering better last techniques.

– Results cannot be generalised to other subject types.

4.1 RQ1.1: Participants’ Perceptions

Table 10 shows the percentage of participants that perceive each technique to be the most effective. We cannot reject the null hypothesis that the frequency distribution of the responses to the questionnaire item (Using which technique did you detect most defects? ) follows a uniform distribution12 (χ 2 (2,N=23)=2.696, p=0.260). This means that the number of participants perceiving a particular technique as being more effective cannot be considered different for all three techniques. Our data do not support the conclusion that techniques are differently frequently perceived as being the most effective.

4.2 RQ1.2: Comparing Perceptions with Reality

Table 11 shows the value of kappa along with its 95% confidence interval (CI), overall and for each technique separately. We find that all values for kappa with respect to the questionnaire item (Using which technique did you detect most defects?) are consistent with lack of agreement (κ<0.4, poor). Although the upper bound of the 95% CIs show agreement, 0 belongs to all 95% CI, meaning that agreement by chance cannot be ruled out. Therefore, our data do not support the conclusion that participants correctly perceive the most effective technique for them.

\ It is worth noting that agreement is higher for the code review technique (the upper bound of the 95% CI in this case shows excellent agreement). This could be attributed to participants being able to remember the actual number of defects identified in code reading whereas for testing techniques they only wrote the test cases. On the other hand, participants do not know the number of defects injected in each program.

As lack of agreement cannot be ruled out, we examine whether the perceptions are biased. The results of the Stuart-Maxwell test show that the null hypothesis of existence of marginal homogeneity cannot be rejected (χ 2 (2,N=23)=1.125, p=0.570). This means that we cannot conclude that perceptions and reality are differently distributed. Taking into account the results reported in Section 4.1, this would suggest that, in reality, techniques cannot be considered the most effective a different number of times13.

\ Additionally, the results of the McNemar-Bowker test show that the null hypothesis of existence of symmetry cannot be rejected (χ 2 (3,N=23)=1.286, p=0.733). This means that we cannot conclude that there is directionality when participants’ perceptions are wrong. These two results suggest that participants are not differently mistaken about one technique as they are about the others. Techniques are not differently subject to misperceptions.

4.3 RQ1.3: Comparing the Effectiveness of Techniques

We are going to check if misperceptions could be due to participants detecting the same amount of defects with all three techniques, and therefore being impossible for them to make the right decision. Table 12 shows the value and 95% CI of Krippendorff’s α, overall and for each pair of techniques, for all participants and for every design group (participants that applied the same technique on the same program) separately, and Table 13 shows the value and 95% CI of Krippendorff’s α, overall and for each program/session.

\ For values with all participants, we can rule out agreement, as the upper bound of the 95% CIs are consistent with lack of agreement (α<0.4), except for the case of EP-BT and nametbl-ntree for which the upper bound of the 95% CIs are consistent with fair to good agreement. However, even in this two cases, 0 belongs to the 95% CIs, meaning that agreement by chance cannot be ruled out.

\ This means that participants do not obtain similar effectiveness values when applying the different techniques (testing the different programs) so as to be difficult to discriminate among techniques/programs.

Furthermore, kappa values are negative, which indicates disagreement. This is good for the study, as it means that participants should be able to discriminate among techniques, and lack of agreement cannot be attributed to a problem of being impossible to discriminate among techniques. As regards the results for groups, although α values are negative14, the 95% CIs are too wide to show reliable results (due to small sample size). Note that in most of the cases they range from existence of disagreement in the lower bound (α<-0.4) to the existence of agreement in the upper bound (α>0.4).

4.4 RQ1.4: Cost of Mismatch

Table 14 and Figure 1 show the cost of mismatch. We can see that the EP technique has fewer mismatches compared to the other two. Additionally, the mean and median mismatch cost is smaller. On the other hand, the BT technique has more mismatches, and a higher dispersion. The results of the Kruskal-Wallis test reveal that we cannot reject the null hypothesis of techniques having the same mismatch cost (H(2)=0.685, p=0.710). This means that we cannot claim a difference in mismatch cost between the techniques. The estimated mean mismatch cost is 31pp (median 26pp).

These results suggest that the mismatch cost is not negligible (31pp), and is not related to the technique perceived as most effective. However, note that the existence of very high mismatches and few datapoints could be affecting these results.

4.5 RQ1.5: Expected Loss of Effectiveness

Table 15 shows the average loss of effectiveness that should be expected in a project, where typically different testers participate, and therefore, there would

be both matches and mismatches15. Again, the results of the Kruskal-Wallis test reveal that we cannot reject the null hypothesis of techniques having the same expected reduction in technique effectiveness for a project (H(2)=1.510, p=0.470). This means we cannot claim a difference in project effectiveness loss between techniques. The mean expected loss in effectiveness in the project is estimated as 15pp16 .

These results suggest that the expected loss in effectiveness in a project is not negligible (15pp), and is not related to the technique perceived as most effective. However, we must note again that the existence of very high mismatches for BT and few datapoints could be affecting these results.

4.6 Findings of the Original Study

Our findings are:

– Participants should not base their decisions on their own perceptions, as their perceptions are not reliable and have an associated cost.

– We have not been able to find a bias towards one or more particular techniques that might explain the misperceptions.

– Participants should have been able to identify the different effectiveness of techniques.

– Misperceptions cannot be put down to experience. The possible drivers of these misperceptions require further research. Note that these findings cannot be generalised to other types of developers rather than those with the same profile as the ones used in this study.

:::info Authors:

  1. Sira Vegas
  2. Patricia Riofr´ıo
  3. Esperanza Marcos
  4. Natalia Juristo

:::

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 license.

:::

\

Market Opportunity
Best Wallet Logo
Best Wallet Price(BEST)
$0.00373
$0.00373$0.00373
-0.37%
USD
Best Wallet (BEST) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

The Channel Factories We’ve Been Waiting For

The Channel Factories We’ve Been Waiting For

The post The Channel Factories We’ve Been Waiting For appeared on BitcoinEthereumNews.com. Visions of future technology are often prescient about the broad strokes while flubbing the details. The tablets in “2001: A Space Odyssey” do indeed look like iPads, but you never see the astronauts paying for subscriptions or wasting hours on Candy Crush.  Channel factories are one vision that arose early in the history of the Lightning Network to address some challenges that Lightning has faced from the beginning. Despite having grown to become Bitcoin’s most successful layer-2 scaling solution, with instant and low-fee payments, Lightning’s scale is limited by its reliance on payment channels. Although Lightning shifts most transactions off-chain, each payment channel still requires an on-chain transaction to open and (usually) another to close. As adoption grows, pressure on the blockchain grows with it. The need for a more scalable approach to managing channels is clear. Channel factories were supposed to meet this need, but where are they? In 2025, subnetworks are emerging that revive the impetus of channel factories with some new details that vastly increase their potential. They are natively interoperable with Lightning and achieve greater scale by allowing a group of participants to open a shared multisig UTXO and create multiple bilateral channels, which reduces the number of on-chain transactions and improves capital efficiency. Achieving greater scale by reducing complexity, Ark and Spark perform the same function as traditional channel factories with new designs and additional capabilities based on shared UTXOs.  Channel Factories 101 Channel factories have been around since the inception of Lightning. A factory is a multiparty contract where multiple users (not just two, as in a Dryja-Poon channel) cooperatively lock funds in a single multisig UTXO. They can open, close and update channels off-chain without updating the blockchain for each operation. Only when participants leave or the factory dissolves is an on-chain transaction…
Share
BitcoinEthereumNews2025/09/18 00:09
SOLANA NETWORK Withstands 6 Tbps DDoS Without Downtime

SOLANA NETWORK Withstands 6 Tbps DDoS Without Downtime

The post SOLANA NETWORK Withstands 6 Tbps DDoS Without Downtime appeared on BitcoinEthereumNews.com. In a pivotal week for crypto infrastructure, the Solana network
Share
BitcoinEthereumNews2025/12/16 20:44
Crucial Fed Rate Cut: October Probability Surges to 94%

Crucial Fed Rate Cut: October Probability Surges to 94%

BitcoinWorld Crucial Fed Rate Cut: October Probability Surges to 94% The financial world is buzzing with a significant development: the probability of a Fed rate cut in October has just seen a dramatic increase. This isn’t just a minor shift; it’s a monumental change that could ripple through global markets, including the dynamic cryptocurrency space. For anyone tracking economic indicators and their impact on investments, this update from the U.S. interest rate futures market is absolutely crucial. What Just Happened? Unpacking the FOMC Statement’s Impact Following the latest Federal Open Market Committee (FOMC) statement, market sentiment has decisively shifted. Before the announcement, the U.S. interest rate futures market had priced in a 71.6% chance of an October rate cut. However, after the statement, this figure surged to an astounding 94%. This jump indicates that traders and analysts are now overwhelmingly confident that the Federal Reserve will lower interest rates next month. Such a high probability suggests a strong consensus emerging from the Fed’s latest communications and economic outlook. A Fed rate cut typically means cheaper borrowing costs for businesses and consumers, which can stimulate economic activity. But what does this really signify for investors, especially those in the digital asset realm? Why is a Fed Rate Cut So Significant for Markets? When the Federal Reserve adjusts interest rates, it sends powerful signals across the entire financial ecosystem. A rate cut generally implies a more accommodative monetary policy, often enacted to boost economic growth or combat deflationary pressures. Impact on Traditional Markets: Stocks: Lower interest rates can make borrowing cheaper for companies, potentially boosting earnings and making stocks more attractive compared to bonds. Bonds: Existing bonds with higher yields might become more valuable, but new bonds will likely offer lower returns. Dollar Strength: A rate cut can weaken the U.S. dollar, making exports cheaper and potentially benefiting multinational corporations. Potential for Cryptocurrency Markets: The cryptocurrency market, while often seen as uncorrelated, can still react significantly to macro-economic shifts. A Fed rate cut could be interpreted as: Increased Risk Appetite: With traditional investments offering lower returns, investors might seek higher-yielding or more volatile assets like cryptocurrencies. Inflation Hedge Narrative: If rate cuts are perceived as a precursor to inflation, assets like Bitcoin, often dubbed “digital gold,” could gain traction as an inflation hedge. Liquidity Influx: A more accommodative monetary environment generally means more liquidity in the financial system, some of which could flow into digital assets. Looking Ahead: What Could This Mean for Your Portfolio? While the 94% probability for a Fed rate cut in October is compelling, it’s essential to consider the nuances. Market probabilities can shift, and the Fed’s ultimate decision will depend on incoming economic data. Actionable Insights: Stay Informed: Continue to monitor economic reports, inflation data, and future Fed statements. Diversify: A diversified portfolio can help mitigate risks associated with sudden market shifts. Assess Risk Tolerance: Understand how a potential rate cut might affect your specific investments and adjust your strategy accordingly. This increased likelihood of a Fed rate cut presents both opportunities and challenges. It underscores the interconnectedness of traditional finance and the emerging digital asset space. Investors should remain vigilant and prepared for potential volatility. The financial landscape is always evolving, and the significant surge in the probability of an October Fed rate cut is a clear signal of impending change. From stimulating economic growth to potentially fueling interest in digital assets, the implications are vast. Staying informed and strategically positioned will be key as we approach this crucial decision point. The market is now almost certain of a rate cut, and understanding its potential ripple effects is paramount for every investor. Frequently Asked Questions (FAQs) Q1: What is the Federal Open Market Committee (FOMC)? A1: The FOMC is the monetary policymaking body of the Federal Reserve System. It sets the federal funds rate, which influences other interest rates and economic conditions. Q2: How does a Fed rate cut impact the U.S. dollar? A2: A rate cut typically makes the U.S. dollar less attractive to foreign investors seeking higher returns, potentially leading to a weakening of the dollar against other currencies. Q3: Why might a Fed rate cut be good for cryptocurrency? A3: Lower interest rates can reduce the appeal of traditional investments, encouraging investors to seek higher returns in alternative assets like cryptocurrencies. It can also be seen as a sign of increased liquidity or potential inflation, benefiting assets like Bitcoin. Q4: Is a 94% probability a guarantee of a rate cut? A4: While a 94% probability is very high, it is not a guarantee. Market probabilities reflect current sentiment and data, but the Federal Reserve’s final decision will depend on all available economic information leading up to their meeting. Q5: What should investors do in response to this news? A5: Investors should stay informed about economic developments, review their portfolio diversification, and assess their risk tolerance. Consider how potential changes in interest rates might affect different asset classes and adjust strategies as needed. Did you find this analysis helpful? Share this article with your network to keep others informed about the potential impact of the upcoming Fed rate cut and its implications for the financial markets! To learn more about the latest crypto market trends, explore our article on key developments shaping Bitcoin price action. This post Crucial Fed Rate Cut: October Probability Surges to 94% first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 02:25