In this study, we address the crucial problem of instability in hyperbolic deep learning, particularly in the learning of the curvature of the manifold. Naive techniques have a fundamental weakness that the authors point out: performance deteriorates when the curvature parameter is updated before the model parameters are updated, invalidating the Riemannian gradients and projections. They address this by presenting a new ordered projection schema that re-projects the model parameters onto the new manifold after first updating the curvature and then projecting them to a stable tangent space.In this study, we address the crucial problem of instability in hyperbolic deep learning, particularly in the learning of the curvature of the manifold. Naive techniques have a fundamental weakness that the authors point out: performance deteriorates when the curvature parameter is updated before the model parameters are updated, invalidating the Riemannian gradients and projections. They address this by presenting a new ordered projection schema that re-projects the model parameters onto the new manifold after first updating the curvature and then projecting them to a stable tangent space.

Understanding Training Stability in Hyperbolic Neural Networks

2025/10/28 22:52

Abstract and 1. Introduction

  1. Related Work

  2. Methodology

    3.1 Background

    3.2 Riemannian Optimization

    3.3 Towards Efficient Architectural Components

  3. Experiments

    4.1 Hierarchical Metric Learning Problem

    4.2 Standard Classification Problem

  4. Conclusion and References

3.1 Background

\

3.2 Riemannian Optimization

Optimizers for Learned Curvatures In their hyperbolic learning library GeoOpt, Kochurov et al. [21] attempt to make the curvature of the hyperbolic space a learnable parameter. However, we have found no further work that makes proper use of this feature. Additionally, our empirical tests show that this approach often results in higher levels of instability and performance degradation. We attribute these issues to the naive implementation of curvature updates, which fails to incorporate the updated hyperbolic operations into the learning algorithm. Specifically, Riemannian optimizers rely on Riemannian projections of Euclidean gradients and projected momentums onto the tangent spaces of gradient vectors. These operations depend on the current properties of the manifold that houses the hyperbolic parameters being updated. From this, we can identify one main issue with the naive curvature learning approach.

\ The order in which parameters are updated is crucial. Specifically, if the curvature of the space is updated before the hyperbolic parameters, the Riemannian projections and tangent projections of the gradients and momentums become invalid. This happens because the projection operations start using the new curvature value, even though the hyperbolic parameters, hyperbolic gradients, and momentums have not yet been reprojected onto the new manifold.

\ To resolve this issue, we propose a projection schema and an ordered parameter update process. To sequentialize the optimization of model parameters, we first update all manifold and Euclidean parameters, and then update the curvatures after. Next, we parallel transport all Riemannian gradients and project all hyperbolic parameters to the tangent space at the origin using the old curvature value. Since this tangent space remains invariant when the manifold curvature changes, we can assume the points now lie on the tangent space of the new origin as well. We then re-project the hyperbolic tensors back onto the manifold using the new curvature value and parallel transport the Riemannian gradients to their respective parameters. This process can be illustrated in algorithm 1.

\

\ Riemannian AdamW Optimizer Recent works, especially with transformers, rely on the AdamW optimizer proposed by Loshchilov and Hutter [26] for training. As of current, there is no established Riemannian variant of this optimizer. We attempt to derive AdamW for the Lorentz manifold and argue a similar approach could be generalized for the Poincaré ball. The main difference between AdamW and Adam is the direct weight regularization which is more difficult to perform in the Lorentz space given the lack of an intuitive subtraction operation on the manifold. To resolve this, we model the regularized parameter instead as a weighted centroid with the origin. The regularization schema becomes:

\

\

\ As such, we propose a maximum distance rescaling function on the tangent of the origin to conform with the representational capacity of hyperbolic manifolds.

\

\ Specifically, we apply it when moving parameters across different manifolds. This includes moving from the Euclidean space to the Lorentz space and moving between Lorentz spaces of different curvatures. We also apply the scaling after Lorentz Boosts and direct Lorentz concatenations [31]. Additionally, we add this operation after the variance-based rescaling in the batchnorm layer. This is because we run into situations where adjusting to the variance pushes the points outside the radius during the operation.

3.3 Towards Efficient Architectural Components

Lorentz Convolutional Layer In their work, Bdeir et al. [1] relied on dissecting the convolution operation into a window-unfolding followed by a modified version of the Lorentz Linear layer by Chen et al. [3]. However, an alternative definition for the Lorentz Linear layer is offered by Dai et al. [5] based on a direct decomposition of the operation into a Lorentz boost and a Lorentz rotation. We follow the dissection scheme by Bdeir et al. [1] but rely on Dai et al. [5]s’ alternate definition of the Lorentz linear transformation. The core transition here would be moving from a matrix multiplication on the spatial dimensions followed by a reprojection, to learning an individual rotation operation and a Lorentz Boost.

\

\ out = LorentzBoost(TanhScaling(RotationConvolution(x)))

\ where TanhRescaling is the operation described in 2 and RotationConvolution is a normal convolution parameterized through the procedure in 2, where Orthogonalize is a Cayley transformation similar to [16]. We use the Cayley transformation in particular because it always results in an orthonormal matrix with a positive determinant which prevents the rotated point from being carried to the lower sheet of the hyperboloid.

\ Lorentz-Core Bottleneck Block In an effort to expand on the idea of hybrid hyperbolic encoders [1], we designed the Lorentz Core Bottleneck blocks for Hyperbolic Resnet-based models. This is similar to a standard Euclidean bottleneck block except we replace the internal 3x3 convolutional layer with our efficient convolutional layer as seen in figure 1. We are then able to benefit from a hyperbolic structuring of the embeddings in each block while maintaining the flexibility and speed of Euclidean models. We interpret this integration as a form of hyperbolic bias that can be adopted into Resnets without strict hyperbolic modeling.

Specifically, we apply it when moving parameters across different manifolds. This includes moving from the Euclidean space to the Lorentz space and moving between Lorentz spaces of different curvatures. We also apply the scaling after Lorentz Boosts and direct Lorentz concatenations [31]. Additionally, we add this operation after the variance-based rescaling in the batchnorm layer. This is because we run into situations where adjusting to the variance pushes the points outside the radius during the operation.

\

:::info Authors:

(1) Ahmad Bdeir, Data Science Department, University of Hildesheim (bdeira@uni-hildesheim.de);

(2) Niels Landwehr, Data Science Department, University of Hildesheim (landwehr@uni-hildesheim.de).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Share Insights

You May Also Like

If you put $1,000 in Intel at the start of 2025, here’s your return now

If you put $1,000 in Intel at the start of 2025, here’s your return now

The post If you put $1,000 in Intel at the start of 2025, here’s your return now appeared on BitcoinEthereumNews.com. Intel (NASDAQ: INTC) and Nvidia (NASDAQ: NVDA) announced a new partnership on Thursday, September 18, working on several generations of custom data center and computing chips designed to boost performance in hyperscale, enterprise, and consumer applications. As part of the collaboration, Nvidia, the undisputed leader of the semiconductor sector, will also invest $5 billion in Intel by purchasing its common stock at a price of $23.28 per share. Following the news, Intel stock jumped more than 30% in pre-market trading, while Nvidia saw a 3% uptick, a welcome change following weeks of shaky performance and controversies regarding its Chinese sales. Trading at $31.34 at the time of writing, INTC shares are up 54.99% year-to-date (YTD). INTC YTD stock price. Source: Google Accordingly, a $1,000 investment in the tech company at the start of the year would now be worth $1,549.90, giving you a return of $549.90. ‘The next era of computing’ The move follows a wave of fresh backing for the struggling Intel, including a nearly $9 billion U.S. government purchase of a 10% stake just weeks ago and a $2 billion investment from Japan’s SoftBank. As such, the deal has the potential to put Intel back into the game after years of trying to catch up not just with Nvidia but also AMD (NASDAQ: AMD) and Broadcom (NASDAQ: AVGO). “This historic collaboration tightly couples NVIDIA’s AI and accelerated computing stack with Intel’s CPUs and the vast x86 ecosystem — a fusion of two world-class platforms. Together, we will expand our ecosystems and lay the foundation for the next era of computing,” wrote Nvidia founder and chief executive officer (CEO), Jensen Huang.  However, the U.S. government’s direct involvement suggests that more is at stake than simply propping up Intel, as it likely reflects a broader concern about keeping America competitive…
Share
2025/09/18 22:47
Historic S&P 500 Milestone: Index Soars Past 6900 for the First Time

Historic S&P 500 Milestone: Index Soars Past 6900 for the First Time

BitcoinWorld Historic S&P 500 Milestone: Index Soars Past 6900 for the First Time The financial world is buzzing with excitement as the S&P 500 index has officially surpassed the 6900 mark for the very first time in its history. This unprecedented achievement signals a powerful wave of optimism and robust performance across the market, capturing the attention of investors worldwide, including those deeply entrenched in the cryptocurrency space. What does this significant leap mean for the broader economy and your investment portfolio? Understanding the S&P 500’s Historic Climb to 6900 The S&P 500, or the Standard & Poor’s 500, is a stock market index that represents the performance of 500 of the largest publicly traded companies in the United States. It is widely considered one of the best gauges of large-cap U.S. equities and the overall health of the American economy. Reaching 6900 is not just a number; it is a testament to sustained corporate growth, technological innovation, and investor confidence. This new peak highlights several key aspects: Strong Corporate Earnings: Many companies within the index have reported impressive earnings, exceeding analyst expectations. Technological Advancements: The technology sector continues to be a significant driver, with innovation fueling growth and market capitalization. Investor Confidence: A general sense of optimism prevails, encouraging further investment in equities. When the S&P 500 hits such a high, it often creates a positive ripple effect, influencing various investment sectors. What’s Fueling the S&P 500’s Momentum? Several interconnected factors are contributing to this remarkable ascent. Economic data, while sometimes mixed, has largely supported a narrative of resilience and growth. The market has shown remarkable adaptability to various global challenges, consistently finding pathways for expansion. Key drivers include: Robust Economic Growth: Despite inflationary pressures, consumer spending and employment figures remain strong, indicating a healthy economy. Monetary Policy Expectations: Investors are closely watching central bank decisions, and expectations of stable or easing monetary policy often boost market sentiment. Innovation in Key Sectors: Beyond tech, sectors like healthcare and renewable energy are also showing significant growth, diversifying the market’s strength. This sustained momentum for the S&P 500 reflects a dynamic economic landscape. How Does This S&P 500 Rally Impact Your Investments? A rising S&P 500 generally signals a positive environment for many types of investments. For those with traditional portfolios, this milestone likely translates to gains in retirement accounts and mutual funds tied to the index. But what about the cryptocurrency investor? While crypto markets operate with their own unique dynamics, they are not entirely isolated from traditional finance. A strong stock market can: Boost Investor Confidence: General market optimism can spill over into riskier assets like cryptocurrencies. Increase Disposable Capital: As traditional portfolios grow, some investors may reallocate a portion of their gains into alternative assets. Signal Macroeconomic Health: A healthy traditional market suggests a robust economy, which can indirectly support speculative assets. It is important to remember that correlation does not always equal causation, but market sentiment is often interconnected. Navigating the Future: Opportunities and Challenges for the S&P 500 While the current outlook for the S&P 500 is overwhelmingly positive, savvy investors always consider both opportunities and potential challenges. The journey to 6900 is a significant achievement, but markets are inherently cyclical. Opportunities for continued growth include: Further technological breakthroughs and AI adoption. Expansion into new global markets by leading companies. Continued strong corporate earnings reports. However, challenges persist, such as: Lingering inflation concerns and potential interest rate adjustments. Geopolitical tensions that could disrupt global supply chains. Market valuations reaching high levels, prompting caution among some analysts. Investors should focus on long-term strategies and diversification, regardless of the short-term market movements. A New Era for the S&P 500 The S&P 500 reaching 6900 is a truly historic moment, reflecting the resilience and innovative spirit of the American economy. This milestone not only brings cheer to traditional investors but also sends a positive signal across the broader financial landscape, potentially influencing sentiment in the cryptocurrency markets. As we move forward, monitoring economic indicators and corporate performance will be key to understanding the index’s continued trajectory. This achievement reminds us that markets are constantly evolving, presenting both opportunities and the need for informed decision-making. Frequently Asked Questions (FAQs) What is the S&P 500 index? The S&P 500 is a stock market index that tracks the stock performance of 500 of the largest companies listed on stock exchanges in the United States. It is maintained by S&P Global and is widely regarded as one of the best indicators of large-cap U.S. equities and the overall health of the U.S. economy. Why is the S&P 500 reaching 6900 significant? Reaching 6900 for the first time signifies a new all-time high for the index. This indicates strong corporate earnings, robust economic activity, and high investor confidence in the market. It often reflects a period of economic expansion and growth. How does the Federal Reserve influence the S&P 500? The Federal Reserve (the Fed) significantly influences the S&P 500 through its monetary policy decisions, primarily interest rates. Lower interest rates can make borrowing cheaper for companies and increase the attractiveness of stocks compared to bonds, often boosting the index. Conversely, higher rates can have the opposite effect. Is investing in the S&P 500 safe? While the S&P 500 is diversified across 500 companies, making it less volatile than investing in a single stock, no investment is entirely “safe.” It carries market risk, meaning its value can fluctuate. However, historically, it has shown strong long-term growth, making it a popular choice for long-term investors. What are some of the largest companies in the S&P 500? The S&P 500 includes many of the world’s most recognizable companies. Historically, some of the largest by market capitalization have included technology giants like Apple, Microsoft, Amazon, and NVIDIA, among others. These companies often play a significant role in the index’s overall performance. If you found this article insightful, consider sharing it with your network! Your shares help us reach more readers interested in understanding the dynamic world of finance and investing. Spread the word! To learn more about the latest crypto market trends, explore our article on key developments shaping Ethereum price action. This post Historic S&P 500 Milestone: Index Soars Past 6900 for the First Time first appeared on BitcoinWorld.
Share
2025/10/29 02:25