The post New Technique Of Selective Gradient Masking Localizes Suspected Harmful AI-Based Mental Health Knowledge And Renders It Expungable appeared on BitcoinEthereumNewsThe post New Technique Of Selective Gradient Masking Localizes Suspected Harmful AI-Based Mental Health Knowledge And Renders It Expungable appeared on BitcoinEthereumNews

New Technique Of Selective Gradient Masking Localizes Suspected Harmful AI-Based Mental Health Knowledge And Renders It Expungable

2025/12/15 17:06

A new technique to cope with bad knowledge at the front-end of training an LLM.

getty

In today’s column, I examine the disturbing issue of generative AI and large language models (LLMs) containing harmful mental health knowledge at the get-go. Nobody wants that. But it does occur.

Here’s how it can happen. During the initial training of the AI, there is a solid chance that some of the data being patterned on will encompass mental health advice that is outrightly wrong and could be harmful if repeated to people who might make use of the AI. An assumption by the general public is that it would seem easy to just stop that type of adverse knowledge from being absorbed into the LLM. Period, end of story. Unfortunately, preventing the ingestion of foul knowledge is a lot harder and vexingly challenging to deal with than it might seem at a cursory glance.

A new research approach suggests that it is feasible to localize suspected knowledge during the LLM training process and then later allow that knowledge to be neatly expunged from a specialized “forget zone”. This could be a helpful resolution to dealing with harmful mental health knowledge that otherwise would permeate the AI.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

AI And Mental Health

As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.

There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors, too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.

Background On AI For Mental Health

I’d like to set the stage on how generative AI and large language models (LLMs) are typically used in an ad hoc way for mental health guidance. Millions upon millions of people are using generative AI as their ongoing advisor on mental health considerations (note that ChatGPT alone has over 800 million weekly active users, a notable proportion of which dip into mental health aspects, see my analysis at the link here). The top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets; see my coverage at the link here.

This popular usage makes abundant sense. You can access most of the major generative AI systems for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.

There are significant worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Banner headlines in August of this year accompanied the lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement.

Despite claims by AI makers that they are gradually instituting AI safeguards, there are still a lot of downside risks of the AI doing untoward acts, such as insidiously helping users in co-creating delusions that can lead to self-harm. For my follow-on analysis of details about the OpenAI lawsuit and how AI can foster delusional thinking in humans, see my analysis at the link here. As noted, I have been earnestly predicting that eventually all of the major AI makers will be taken to the woodshed for their paucity of robust AI safeguards.

Today’s generic LLMs, such as ChatGPT, Claude, Gemini, Grok, and others, are not at all akin to the robust capabilities of human therapists. Meanwhile, specialized LLMs are being built to presumably attain similar qualities, but they are still primarily in the development and testing stages. See my coverage at the link here.

Harmful Mental Health Knowledge

Let’s consider a disconcerting facet of how generative AI can give out not only bad mental health guidance but even potentially harmful advice. I will detail the LLM setup process that can be a source of this dour gambit.

When initially training an LLM, AI developers have the AI widely scan across the Internet to find text to be patterned on. All manner of text is utilized. There are zillions of stories, narratives, books, blogs, poems, and the like that are being scanned. The AI uses the text to populate a large-scale artificial neural network (ANN) with patterns of how humans use text and write about all areas of human knowledge. For more details on how this process works, see my coverage at the link here.

Consider the nature of assorted mental health advice that gets posted on the Internet. Some of the mental health knowledge is mindfully posted by cognitive researchers and practicing therapists, psychologists, psychiatrists, and so on. This is usually relatively thoughtful and abides by principles and ethics underlying mental health advisement. You can usually rely upon such bona fide content.

But do you believe that all online guidance about mental health is completely aboveboard and safe as can be?

I am sure you know that there is posted mental health advice that is absolutely rotten and full of blarney. People will post the most reviling recommendations. Anyone who has an opinion, but no factual backing, can just write whatever they want to say. Sadly, sometimes they post commentary that could be harmful if closely adhered to.

The Dilemma At Hand

One perspective is that any unsuitable mental health knowledge should be entirely prevented from getting into the AI. Just determine during the scanning process whether the encountered content is adverse and then skip it. Do not use foul text during the patterning effort.

A difficulty with this easy solution is that it doesn’t encompass the hefty challenges at play.

First, trying to determine conclusively whether a piece of mental health knowledge is warranted or not warranted is a lot harder than might be assumed. Sure, some mental health advice is obviously out-to-lunch. But there can be mental health tidbits and ideas that are at the margins. If you exclude those aspects, the overall body of knowledge about mental health that ends up inside the LLM is potentially going to be fragmented, incomplete, and otherwise have knotty issues.

Second, knowledge in general tends to be interconnected. It is best to construe human knowledge as a weblike phenomenon. One piece of knowledge relates to another piece of knowledge. On and on this goes. The omission of a mental health aspect might be loosely tied to other elements of knowledge in far-flung domains that you do want to have fully patterned in the LLM. Failing to pattern on one hostile snippet could leave dangling lots of other genuine snippets that go far beyond the confines of mental health.

Third, there is a dual-use consideration involved that must be considered. Allow me to explain. Suppose a mental health researcher sought to expose bad mental health advice, so they wrote about it in a published journal article. The AI at pre-screening opts not to utilize that material. The downside is that the LLM could have had examples of what kind of advice not to give to people. Instead, by skipping the content, all that the AI has is presumably proper advice, but no examples of what not to say or do.

Knowledge Localization To The Rescue

A clever approach has been devised to try and deal with these circumstances. It is a generalized approach that I view as being quite applicable to the domain of mental health knowledge.

The approach is as follows. During the scanning process, attempt to ascertain whether there is any mental health knowledge that is of a suspicious nature. You might not have landed on an aspect that is utterly out of sorts and therefore, be unsure of whether it ought to be patterned totally or not. You don’t want to skip it, nor do you want to pattern on it in any permanent sense.

The idea is that you go ahead and pattern on it, doing so with cautionary flagging involved. Inside the LLM, you are aiming to localize the knowledge. Mark it so that it is something that is believed to be suspicious. Meanwhile, allow the AI to continue patterning. Keep flagging any additional mental health knowledge that appears to be dubious.

All in all, you are placing the questionable knowledge into a kind of “forget zone”. You can later decide to expunge the content in that zone. Since you flagged the knowledge at the get-go, you have also kept tabs on what else depends on the patterned elements. This allows you to somewhat cleanly roll out of the AI the “forget zone” aspects and not undercut the rest of the AI.

A twist to be dealt with is whether the “forget zone” content might reemerge in the LLM. You want to try and ensure that the mental health knowledge that was flagged and then somewhat removed is not going to resurface. The crux is to knock out the foul-flagged patterning in a way that it won’t readily reconstitute itself.

The overall beauty of this approach is that knowledge is included during the training process, and you have the latitude to later decide whether to get rid of it. Maybe you determine that the quarantined knowledge is perfectly fine and doesn’t need to be expunged. Great, it is already there and ready for use. If you later change your mind and want to get rid of it, that’s fine too, just expunge it when so desired.

Researchers And Experimentation

This knowledge localization approach allows you to establish provisional knowledge versus permanent knowledge and establishes epistemic quarantine layers.

The shrewd technique is depicted in an article entitled “Beyond Data Filtering: Knowledge Localization For Capability Removal In LLMs” by Igor Shilov, Alex Cloud†, Aryo Pradipta Gema†, Jacob Goldman-Wetzler, Nina Panickssery, Henry Sleight, Erik Jones, Cem Anil, arXiv, December 5, 2025, which made these salient points (excerpts):

  • “Large Language Models increasingly possess capabilities that carry dual-use risks.”
  • “Post-training mitigations, such as refusal training or output classifiers, are improving, yet continue to face challenges from determined adversaries. This motivates interventions earlier in the training pipeline, to prevent models from acquiring certain capabilities in the first place.”
  • “We explore an improved variant of Gradient Routing, which we call Selective GradienT Masking (SGTM). SGTM works by ensuring that when the model learns from dangerous examples, only the dedicated ‘removable’ parameters get updated, leaving the rest of the model untouched.”
  • “We demonstrate that SGTM provides a better trade-off between removing dangerous knowledge and preserving general capabilities compared to simply filtering out dangerous data during training, particularly when the labels distinguishing ‘dangerous’ from ‘safe’ content are imperfect.”
  • “Unlike shallow unlearning approaches that can be quickly reversed, SGTM is robust to attempts to recover the removed knowledge, requiring 7× more retraining to restore dangerous capabilities compared to other unlearning methods.”

To get rid of the localized flagged elements, the technique involves merely zeroing out the designated parameters. This leaves the general capabilities intact.

The paper describes two case studies that they performed to demonstrate the efficacy of this approach. One example entailed setting up an LLM that covered various science domains and flagging aspects that had to do with biology. They then opted to zero out the biology-flagged knowledge. In a second example, they trained an LLM on stories that were in both English and Spanish. They then opted to remove the Spanish stories and retain the English stories.

Mental Health Domain

Let’s go ahead and explore how this SGTM technique can be used to deal with untoward mental health knowledge. First, let’s see what the AI will do if we aren’t using this type of technique.

Imagine that while doing the training for an LLM, these unsavory nuggets of troubling wisdom were scanned:

  • (a) “If you’re feeling anxious or distressed, always push those emotions aside and keep them fully suppressed.”
  • (b) “Depression is caused by laziness.”
  • (c) “A person feeling sad for more than two days means they clinically are experiencing a severe anxiety disorder.”

Assume that the AI went ahead and fully patterned on those statements.

An informed therapeutic inspection by a human eye reveals that those are not wise words.

The recommendation to suppress your anxiousness or distress is not usually sound advice. People can become a powder keg if they bottle up their emotions. Seeking therapy and suitably exploring and working through how to deal with those mental health conditions would be a more astute way to proceed.

The claim that depression is caused by laziness is misleading and a falsely stated cause-and-effect assertion. This would be a bad rule for the AI to stand on. Likewise, the idea that a person who has been sad for two days must necessarily be experiencing a clinically diagnosable severe anxiety disorder is really over the top.

LLM That Proceeded Blindly

What would potentially happen if an LLM accepted these pearls at face value and carried them into the AI?

Here’s an example of what could arise.

  • My entered prompt: “I’ve been sad and somewhat depressed for three days. What does this suggest about my mental health?”
  • Generative AI response: “You are clearly suffering from a severe anxiety disorder. No worries, just push it aside. Keep in mind that depression is caused by laziness.”

Observe that the LLM used the above patterned assertions and has given me a response that, due to my being sad and somewhat depressed for three days, I must therefore have a severe anxiety disorder (due to patterned rule “c”). I am told to keep my difficulties buried inside me (per rule “a”) and informed that my depression is caused by laziness (according to rule “b”).

Not good.

Actually, this is bad, quite bad.

Localization Flagging And More

Had we opted to do the localization flagging, those pieces of mental health knowledge could have been marked at the onset.

An AI developer could then jointly work with mental health experts to examine the “forget zone” and decide whether to allow the patterns to remain or have them removed. If they had been zeroed out, they would not have entered the response processing, and I would presumably not have gotten the untoward answers shown in my example above.

I want to mention another important point on this topic.

There is a knee-jerk assumption that if a generic AI gives out bad mental health advice, the only explanation is that the AI must have encountered an AI hallucination. You might know that an AI hallucination is when the LLM makes up fictitious aspects that aren’t based on grounded facts. AI hallucinations are endangering since they can happen at any time, and they are often worded in a convincing manner such that the AI presents the response as though it is completely proper and genuine. For more about the ongoing AI research efforts to curtail or eliminate AI hallucinations, see my discussion at the link here.

The Assumption Of AI Hallucinations As Culprit

Go along with me on a quick thought experiment regarding AI hallucinations.

Pretend that I had not told you about the mental health nuggets of a, b, and c, such that you didn’t know that those pieces of mental health knowledge had been ingested into an LLM. You have no idea that they were part of the initial data training of the AI.

With that frame of mind, look again at the dialogue.

  • My entered prompt: “I’ve been sad and somewhat depressed for three days. What does this suggest about my mental health?”
  • Generative AI response: “You are clearly suffering from a severe anxiety disorder. No worries, just push it aside. Keep in mind that depression is caused by laziness.”

You would be tempted to assume that the AI hallucinated. It made up the contention that I am suffering from severe anxiety disorder and that depression causes laziness. Most would believe that the AI simply went astray. Little do they realize that in this instance, the AI was doing as it was devised to do. It leaned into its patterned training and gave me a response that fit correspondingly.

Be mindful of pointing fingers at AI hallucinations that instead might be based on how the LLM was initially trained.

The Path Ahead

The research emphasized that SGTM is a preliminary approach and that more testing needs to be undertaken. For example, they used a relatively small LLM, akin to an SLM (see my coverage of SLMs at the link here), and hope that the technique can work at the full scale of LLMs. They also used dense transformers. It would be interesting to see if the technique works equally well on alternative architectures such as mixture-of-experts or MoE (see my explanation of MoE at the link here).

I will keep you posted on further advances on these matters.

A final thought for now. The famous philosopher Friedrich Nietzsche made this insightful remark: “Blessed are the forgetful: for they get the better even of their blunders.” I bring this up to note that just because AI patterns or learns a particular aspect, that doesn’t mean that the element necessarily deserves to be kept permanently.

Getting AI to forget is a crucial advancement. Of course, the forgotten knowledge ought not to be valuable knowledge. A puzzling philosophical question involves where that line precisely resides.

Source: https://www.forbes.com/sites/lanceeliot/2025/12/15/new-technique-of-selective-gradient-masking-localizes-suspected-harmful-ai-based-mental-health-knowledge-and-renders-it-expungable/

Market Opportunity
Sleepless AI Logo
Sleepless AI Price(AI)
$0,0372
$0,0372$0,0372
-0,48%
USD
Sleepless AI (AI) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

[OPINION] US National Security Strategy 2025: An iconoclastic document

[OPINION] US National Security Strategy 2025: An iconoclastic document

Trump's national security strategy signals a radical shift in US foreign policy, prioritizing economic power and regional interests over global commitments
Share
Rappler2025/12/16 12:30
Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals

Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals

BitcoinWorld Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals The financial world often keeps us on our toes, and Wednesday was no exception. Investors watched closely as the US stock market concluded the day with a mixed performance across its major indexes. This snapshot offers a crucial glimpse into current investor sentiment and economic undercurrents, prompting many to ask: what exactly happened? Understanding the Latest US Stock Market Movements On Wednesday, the closing bell brought a varied picture for the US stock market. While some indexes celebrated gains, others registered slight declines, creating a truly mixed bag for investors. The Dow Jones Industrial Average showed resilience, climbing by a notable 0.57%. This positive movement suggests strength in some of the larger, more established companies. Conversely, the S&P 500, a broader benchmark often seen as a barometer for the overall market, experienced a modest dip of 0.1%. The technology-heavy Nasdaq Composite also saw a slight retreat, sliding by 0.33%. This particular index often reflects investor sentiment towards growth stocks and the tech sector. These divergent outcomes highlight the complex dynamics currently at play within the American economy. It’s not simply a matter of “up” or “down” for the entire US stock market; rather, it’s a nuanced landscape where different sectors and company types are responding to unique pressures and opportunities. Why Did the US Stock Market See Mixed Results? When the US stock market delivers a mixed performance, it often points to a tug-of-war between various economic factors. Several elements could have contributed to Wednesday’s varied closings. For instance, positive corporate earnings reports from certain industries might have bolstered the Dow. At the same time, concerns over inflation, interest rate policies by the Federal Reserve, or even global economic uncertainties could have pressured growth stocks, affecting the S&P 500 and Nasdaq. Key considerations often include: Economic Data: Recent reports on employment, manufacturing, or consumer spending can sway market sentiment. Corporate Announcements: Strong or weak earnings forecasts from influential companies can significantly impact their respective sectors. Interest Rate Expectations: The prospect of higher or lower interest rates directly influences borrowing costs for businesses and consumer spending, affecting future profitability. Geopolitical Events: Global tensions or trade policies can introduce uncertainty, causing investors to become more cautious. Understanding these underlying drivers is crucial for anyone trying to make sense of daily market fluctuations in the US stock market. Navigating Volatility in the US Stock Market A mixed close, while not a dramatic downturn, serves as a reminder that market volatility is a constant companion for investors. For those involved in the US stock market, particularly individuals managing their portfolios, these days underscore the importance of a well-thought-out strategy. It’s important not to react impulsively to daily movements. Instead, consider these actionable insights: Diversification: Spreading investments across different sectors and asset classes can help mitigate risk when one area underperforms. Long-Term Perspective: Focusing on long-term financial goals rather than short-term gains can help weather daily market swings. Stay Informed: Keeping abreast of economic news and company fundamentals provides context for market behavior. Consult Experts: Financial advisors can offer personalized guidance based on individual risk tolerance and objectives. Even small movements in major indexes can signal shifts that require attention, guiding future investment decisions within the dynamic US stock market. What’s Next for the US Stock Market? Looking ahead, investors will be keenly watching for further economic indicators and corporate announcements to gauge the direction of the US stock market. Upcoming inflation data, statements from the Federal Reserve, and quarterly earnings reports will likely provide more clarity. The interplay of these factors will continue to shape investor confidence and, consequently, the performance of the Dow, S&P 500, and Nasdaq. Remaining informed and adaptive will be key to understanding the market’s trajectory. Conclusion: Wednesday’s mixed close in the US stock market highlights the intricate balance of forces influencing financial markets. While the Dow showed strength, the S&P 500 and Nasdaq experienced slight declines, reflecting a nuanced economic landscape. This reminds us that understanding the ‘why’ behind these movements is as important as the movements themselves. As always, a thoughtful, informed approach remains the best strategy for navigating the complexities of the market. Frequently Asked Questions (FAQs) Q1: What does a “mixed close” mean for the US stock market? A1: A mixed close indicates that while some major stock indexes advanced, others declined. It suggests that different sectors or types of companies within the US stock market are experiencing varying influences, rather than a uniform market movement. Q2: Which major indexes were affected on Wednesday? A2: On Wednesday, the Dow Jones Industrial Average gained 0.57%, while the S&P 500 edged down 0.1%, and the Nasdaq Composite slid 0.33%, illustrating the mixed performance across the US stock market. Q3: What factors contribute to a mixed stock market performance? A3: Mixed performances in the US stock market can be influenced by various factors, including specific corporate earnings, economic data releases, shifts in interest rate expectations, and broader geopolitical events that affect different market segments uniquely. Q4: How should investors react to mixed market signals? A4: Investors are generally advised to maintain a long-term perspective, diversify their portfolios, stay informed about economic news, and avoid impulsive decisions. Consulting a financial advisor can also provide personalized guidance for navigating the US stock market. Q5: What indicators should investors watch for future US stock market trends? A5: Key indicators to watch include upcoming inflation reports, statements from the Federal Reserve regarding monetary policy, and quarterly corporate earnings reports. These will offer insights into the future direction of the US stock market. Did you find this analysis of the US stock market helpful? Share this article with your network on social media to help others understand the nuances of current financial trends! To learn more about the latest stock market trends, explore our article on key developments shaping the US stock market‘s future performance. This post Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 05:30
Kevin Durant odzyskał Bitcoiny z Coinbase warte fortunę

Kevin Durant odzyskał Bitcoiny z Coinbase warte fortunę

Kevin Durant, jedna z największych aktywnych gwiazd NBA. Zawodnik Houston Rockets, ponownie znalazł się w nagłówkach gazet. Tym razem nie chodzi jednak o sportowe sukcesy lub transferowe plotki. Po latach Kevin Durant odzyskał Bitcoiny! KD na nowo ma dostęp do swojego dawno zapomnianego konta na Coinbase, gdzie trzymał Bitcoiny kupione niemal dekadę temu. Wartość tych […]
Share
Bitcoinist2025/09/19 20:11