Skip to content

Refactor cpuid#1251

Open
AntoinePrv wants to merge 21 commits intoxtensor-stack:masterfrom
AntoinePrv:cpuid
Open

Refactor cpuid#1251
AntoinePrv wants to merge 21 commits intoxtensor-stack:masterfrom
AntoinePrv:cpuid

Conversation

@AntoinePrv
Copy link
Contributor

This is a no-addition, non-breaking refactor of existing CPU id features as first class citizen as part of #1245.

  • Two individual and simple classes parsing for XCR0 and CPUID
  • The classes/header is safe on all platform, avoiding
  • supported_arch keeps the same structure and merges use of both class at the moment but both class could be combined a user-friendly x86_cpu_features.

Copy link
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early review, a few open questions but nothing looks bad here. Way to go!

{
inline void get_cpuid(int reg[4], int level, int count = 0) noexcept;

inline std::uint32_t get_xcr0_low() noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to somehow ensure that this type is the same as ref_t

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this could be a private static method for better encapsulation right?

constexpr bool fma4() const noexcept
{
return utils::bit_is_set<16>(m_regs.reg8[2]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: should we have a macro that makes the generation of those field cleaner to the eye?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way. I am personally not big on macros. I find it easier read this way (could also be made to fit on one line) and find&replace is quite enough a change is required.


namespace detail
{
inline void get_cpuid(int reg[4], int level, int count) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: could be private for better encapsulation.

Copy link
Contributor Author

@AntoinePrv AntoinePrv Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the contrary, I was wondering whether to make it part of public API.
It's a well defined function, unlikely to change, and that could be useful for users to get features we do not expose (not sure we'll go to cover 100% of cpuid).

* The full license is in the file LICENSE, distributed with this software. *
****************************************************************************/

#include "../config/xsimd_inline.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It for header including what they use (otherwise I get nasty errors in my editor).
But this one did not belong here but in xsimd_register.hpp where XSIMD_INLINE is used.

Copy link
Contributor Author

@AntoinePrv AntoinePrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on change may make relative to the previous implementation

Comment on lines 260 to 261
#elif defined(__INTEL_COMPILER)
__cpuid(reg.data(), level);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change to use inline ASM for intel compiler? Missing the count option here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumle so, feel free to give it a try!

#elif defined(_MSC_VER) && _MSC_VER >= 1400
return static_cast<xcr0_reg_t>(_xgetbv(0));

#elif defined(__GNUC__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about __clang__ and __INTEL_COMPILER? Should we reproduce the get_cpuid pattern?

Copy link
Member

@JohanMabille JohanMabille Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT

@AntoinePrv AntoinePrv marked this pull request as draft February 2, 2026 17:18
@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille what do you think of this?

I wanted to add an x86_cpu_features that mixed xcr0, cpuid, as well as future opinionated features (e.g. override avx2 detection from cpuid if the specific implementation is known not to be performant).
However today that felt very redundant, perhaps in the future if we simplify the logic of supported_arch.

@AntoinePrv AntoinePrv marked this pull request as ready for review February 4, 2026 10:13
#elif defined(_MSC_VER) && _MSC_VER >= 1400
return static_cast<xcr0_reg_t>(_xgetbv(0));

#elif defined(__GNUC__)
Copy link
Member

@JohanMabille JohanMabille Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).

@serge-sans-paille
Copy link
Contributor

serge-sans-paille commented Feb 20, 2026

@AntoinePrv please ping me when you need a review!

@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille this is ready. This is intended to make a clean abstraction for CPUID.
In follow-up PRs, I will:

  • Add some sort of similar abstraction for arm64
  • Propose a generic user-facing interface.

regs.reg1 = detail::get_cpuid(0x1);
regs.reg7 = detail::get_cpuid(0x7);
regs.reg7a = detail::get_cpuid(0x7, 0x1);
regs.reg8 = detail::get_cpuid(0x80000001);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: could we avoid some get_cpuid call in some situation? For instance reg8 is only used for fma4, and reg7 for avx2 or later, which mean we could skip filling reg7 if we don't have avx, and fma4 if have fma3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's postpone this as a further optimization then?

Comment on lines 260 to 261
#elif defined(__INTEL_COMPILER)
__cpuid(reg.data(), level);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumle so, feel free to give it a try!

#elif defined(_MSC_VER) && _MSC_VER >= 1400
return static_cast<xcr0_reg_t>(_xgetbv(0));

#elif defined(__GNUC__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT

@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille in 1/2 day I can push another commit (easily revertable) with things I've been trying lately.
Basically the idea is to wrap individual register with strongly typed bitset exposed by an enum (essentially what is done for xcr0).

@serge-sans-paille
Copy link
Contributor

serge-sans-paille commented Feb 25, 2026 via email

@AntoinePrv AntoinePrv force-pushed the cpuid branch 2 times, most recently from dd51e01 to d63a75f Compare February 25, 2026 14:49
@AntoinePrv
Copy link
Contributor Author

@serge-sans-paille with this last commit we have

  • uint_bitset is a generic type that wrap any integer + enum into a strongly typed and descriptive bit set
  • It's used for all reading all registers
    Raw classes such as x86_xcr0 or x86_cpuid_leaf1 can be used as to read and parse registers with strongly typed enums, no more, no less.
  • x86_cpu_features is an advanced class that combine CPUID results with XCR0 permissions. It is also lazy, reading CPUID registers as needed, including short circuiting if a permission would return false anyways. We can also use the class in the future to override if certain platforms are known to give inaccurate replies.

It's more LOC (though there is more documentation now, and many empty classes / brace lines), but it's also more declarative from now on. We just need to add more members to the enums to read other things.
x86_cpu_features on the contrary feels a bit repetitive, though I am not sure if there is a way to intelligently match permissions and cpu features without more boilerplate.

If you like the design I'd be happy to add tests for all classes with know implications (e.g. sse2 is always true on x86_64, avx2 implies avx and sse4.2...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants