Skip to content

gh-152415: Exercise curses non-ASCII tests under 8-bit locale encodings#152416

Merged
serhiy-storchaka merged 2 commits into
python:mainfrom
serhiy-storchaka:curses-test-8bit-locales
Jun 27, 2026
Merged

gh-152415: Exercise curses non-ASCII tests under 8-bit locale encodings#152416
serhiy-storchaka merged 2 commits into
python:mainfrom
serhiy-storchaka:curses-test-8bit-locales

Conversation

@serhiy-storchaka

Copy link
Copy Markdown
Member

The non-ASCII tests in test_curses only exercised what the test runner's locale could encode, in practice UTF-8, so the byte-oriented (8-bit locale) code paths were barely tested and several text-accepting methods were tested only with ASCII.

This extends the character and string I/O tests with cases for 8-bit encodings, each guarded by the existing encodability check (skipped when the current locale cannot represent it): ASCII, a character common to the Latin encodings (é), and ones distinctive to a single encoding (byte 0xA4 is ¤ in ISO-8859-1, in ISO-8859-15, є in KOI8-U). Running the whole suite under different locales (LANG=en_US.ISO8859-1, en_US.ISO8859-15, uk_UA.koi8u) covers those encodings.

It also fills read-side and other gaps found by an audit of the text-accepting API: inch/instr, get_wstr (previously untested), getbkgd/getbkgrnd, unctrl, the default border()/box() ACS cells, and characters given as chtype ints > 127.

A couple of build-/locale-specific notes captured in the comments: on a wide build inch and int (chtype) characters round-trip only Latin-1 codepoints (the wide build stores text/ints through the locale), so those assertions are guarded to ord < 0x100, while instr covers the full set.

Test-only; no behaviour change. Verified on wide and narrow (ncursesw-disabled) builds under UTF-8, ISO-8859-1, ISO-8859-15 and KOI8-U.

…ncodings

The non-ASCII tests only exercised what the runner's locale could encode (in
practice UTF-8).  Add 8-bit-encoding cases to the character and string I/O
tests, each guarded by the existing encodability check: ASCII, a character
common to the Latin encodings ('é'), and ones distinctive to a single encoding
(byte 0xA4 is '¤' in ISO-8859-1, '€' in ISO-8859-15, 'є' in KOI8-U).  Run the
whole suite under different locales to cover them; unrepresentable cases skip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bedevere-app bedevere-app Bot added the tests Tests in the Lib/test dir label Jun 27, 2026
@serhiy-storchaka serhiy-storchaka added needs backport to 3.15 pre-release feature fixes, bugs and security fixes skip news labels Jun 27, 2026
…haracter

Read each written character back with in_wch() or instr() rather than
inch(), which on a wide build returns the low byte of the code point
instead of the locale-encoded byte and so mangles a non-ASCII character
of an 8-bit locale.  This lets the int-argument cases cover '€'/'є', and
adds matching coverage for the str argument.

insch() with an int byte > 127 is checked only for Latin-1: on a wide
build ncurses winsch stores a printable byte directly as a code point
instead of decoding it through the locale.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@serhiy-storchaka serhiy-storchaka merged commit 003d362 into python:main Jun 27, 2026
47 of 48 checks passed
@miss-islington-app

Copy link
Copy Markdown

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.15.
🐍🍒⛏🤖

@serhiy-storchaka serhiy-storchaka deleted the curses-test-8bit-locales branch June 27, 2026 19:16
@miss-islington-app

Copy link
Copy Markdown

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.15 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 003d3620cc0f44caca7bf26c3e6964f5f379645f 3.15

@bedevere-app

bedevere-app Bot commented Jun 27, 2026

Copy link
Copy Markdown

GH-152453 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jun 27, 2026
@serhiy-storchaka serhiy-storchaka removed their assignment Jun 27, 2026
serhiy-storchaka added a commit that referenced this pull request Jun 27, 2026
…encodings (GH-152416) (#152453)

The non-ASCII tests only exercised what the runner's locale could encode (in
practice UTF-8).  Add 8-bit-encoding cases to the character and string I/O
tests, each guarded by the existing encodability check: ASCII, a character
common to the Latin encodings ('é'), and ones distinctive to a single encoding
(byte 0xA4 is '¤' in ISO-8859-1, '€' in ISO-8859-15, 'є' in KOI8-U).  Run the
whole suite under different locales to cover them; unrepresentable cases skip.



* gh-152415: Verify character output round-trips in test_output_character

Read each written character back with in_wch() or instr() rather than
inch(), which on a wide build returns the low byte of the code point
instead of the locale-encoded byte and so mangles a non-ASCII character
of an 8-bit locale.  This lets the int-argument cases cover '€'/'є', and
adds matching coverage for the str argument.

insch() with an int byte > 127 is checked only for Latin-1: on a wide
build ncurses winsch stores a printable byte directly as a code point
instead of decoding it through the locale.
(cherry picked from commit 003d362)

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
serhiy-storchaka added a commit that referenced this pull request Jun 27, 2026
…encodings (GH-152416) (GH-152453) (GH-152457)

The non-ASCII tests only exercised what the runner's locale could encode (in
practice UTF-8).  Add 8-bit-encoding cases to the character and string I/O
tests, each guarded by the existing encodability check: ASCII, a character
common to the Latin encodings ('é'), and ones distinctive to a single encoding
(byte 0xA4 is '¤' in ISO-8859-1, '€' in ISO-8859-15, 'є' in KOI8-U).  Run the
whole suite under different locales to cover them; unrepresentable cases skip.

* gh-152415: Verify character output round-trips in test_output_character

Read each written character back with in_wch() or instr() rather than
inch(), which on a wide build returns the low byte of the code point
instead of the locale-encoded byte and so mangles a non-ASCII character
of an 8-bit locale.  This lets the int-argument cases cover '€'/'є', and
adds matching coverage for the str argument.

insch() with an int byte > 127 is checked only for Latin-1: on a wide
build ncurses winsch stores a printable byte directly as a code point
instead of decoding it through the locale.
(cherry picked from commit 003d362)
(cherry picked from commit a75aa41)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
serhiy-storchaka added a commit that referenced this pull request Jun 27, 2026
…encodings (GH-152416) (GH-152453) (GH-152456)

The non-ASCII tests only exercised what the runner's locale could encode (in
practice UTF-8).  Add 8-bit-encoding cases to the character and string I/O
tests, each guarded by the existing encodability check: ASCII, a character
common to the Latin encodings ('é'), and ones distinctive to a single encoding
(byte 0xA4 is '¤' in ISO-8859-1, '€' in ISO-8859-15, 'є' in KOI8-U).  Run the
whole suite under different locales to cover them; unrepresentable cases skip.

* gh-152415: Verify character output round-trips in test_output_character

Read each written character back with in_wch() or instr() rather than
inch(), which on a wide build returns the low byte of the code point
instead of the locale-encoded byte and so mangles a non-ASCII character
of an 8-bit locale.  This lets the int-argument cases cover '€'/'є', and
adds matching coverage for the str argument.

insch() with an int byte > 127 is checked only for Latin-1: on a wide
build ncurses winsch stores a printable byte directly as a code point
instead of decoding it through the locale.
(cherry picked from commit 003d362)
(cherry picked from commit a75aa41)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news tests Tests in the Lib/test dir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant