Module talk:DecodeEncode

Bug report: bad decoding of U+03B5 ε (epsilon)

About U+03B5 ε GREEK SMALL LETTER EPSILON (ε ε)

  • Issue: after resolving HTML entity ε by mw.text.decode(), the plain character is not found by mw.ustring.gsub(). No issue with alternative HTML entity ε. ε good, ε bad.
Report limitations: Original report and bug reproduction is at enwiki Module talk:DecodeEncode, from where en:module:DecodeEncode and en:module:String are used live. At phabricator pseudocode may be used and some "results" may be hardcoded. In-text the escape & is used, not in-function. Lua patterns not used ("no %").
  • To reproduce:
1. Create research string:
Xε1Xε2X (shows live and unedited as: Xε1Xε2X)
2. Render the string by decode() (as inner function)
3. then on rendered result use gsub() to replace plain character εE: (as outer function)
mw.ustring.gsub( s=(mw.text.decode( s=Xε1Xε2X, decodeNamedEntities=true ) ), pattern=ε, repl=E ) [is pseudo-code, see note. 21:10, 7 February 2023 (UTC)]
4. Result3 (s&r pattern use ε from Xε1X):
XE1XE2X
5. Result4 (s&r pattern use ε from Xε2X):
XE1XE2X
  • Expected: XE1XE2X (only one character ε exists)
{{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
→ XE1XE2X
-DePiep (talk) 21:10, 7 February 2023 (UTC)[reply]

Workaround A, ad hoc

Workaround A, ad hoc: add innermost function to first replace in the research string εε:

A1: {{#invoke:String|replace|source={{#invoke:DecodeEncode|decode|s={{#invoke:String|replace|source=Xε1Xε2X|pattern=ε|replace=ε|plain=true}}}}|pattern=ε|replace=E|plain=true}}
XE1XE2X

Workaround B, in module (THIN SPACE example)

Workaround B: early in :en:module:DecodeEncode, replace εε

About THIN SPACE: it looks like character U+2009 THIN SPACE (   ) has a samilar issue.   good,   bad.

Currently in code:

function p._decode( s, subset_only )
	local ret = nil;
    s = mw.ustring.gsub( s, ' ', ' ' ) -- Workaround for bug:   gets properly decoded in decode, but   doesn't.
	ret = mw.text.decode( s, not subset_only )
	return ret
end

In en:module:DecodeEncode/sandbox, I have coded a similar handling of EPSILON:

module:DecodeEncode, module:DecodeEncode/sandbox diff
function p._decode( s, subset_only )
	local ret = nil;
	-- U+2009 THIN SPACE: workaround for bug: HTML entity   is decoded incorrect. Entity   gets decoded properly
	s = mw.ustring.gsub( s, ' ', ' ' )
	-- U+03B5 ε GREEK SMALL LETTER EPSILON: workaround for bug (phab:T328840): HTML entity ε is decoded incorrect for gsub(). Entity ε gets decoded properly
	s = mw.ustring.gsub( s, 'ε', 'ε' )
	ret = mw.text.decode( s, not subset_only )
	return ret
end
  • /sandbox tests:
B. {{#invoke:String|replace|source={{#invoke:DecodeEncode/sandbox|decode|s=Xε1Xε2X}}|pattern=ε|replace=E|plain=true}}
B1. ResultB1 (s&r pattern use ε from Xε1X): XE1XE2X
B2. ResultB2 (s&r pattern use ε from Xε2X): XE1XE2X

I propose to edit the module along this way.

Workaround C (mw, Lua)

Changes in mw, Lua: I have not idea.

testcases EPSILON

  • Original failure, now solved=not showing any more:
(hardcoded explanation here): in cell marked Red XN, the result showed as "XE1Xε2X". That is: wikitext input "ε" was not recognised & replaced. -DePiep (talk) 07:49, 19 February 2023 (UTC)[reply]
EPSILON ε ε error & fix proposal (16 Feb 2023)
1 2 3 4 5 6
id entity code plain mod:.. decode(&entity;) replace(decode(..)) with E
pattern=hardcoded ⟨ε⟩ from plain
(s=&entity;)
(s=checkstring)
mod:..decode/sandbox
checkstring X&epsi;1X&epsilon;2X >Xε1Xε2X< >Xε1Xε2X<
EPSI &epsi; >ε< >ε< E
XE1XE2X
E
XE1XE2X
EPSILON &epsilon; >ε< >ε< E
XE1XE2X
Red XN
E
XE1XE2X
Similar fix as U+2009 THIN SPACE (&thinsp;, &ThinSpace;) has (though original cause bug may be different for THIN SPACE).
  • Phabricator T328840 did not gain traction. Would be mw-level, not this module.
-DePiep (talk) 06:22, 16 February 2023 (UTC)[reply]

Template-protected edit request on 16 February 2023

Issue: bad decoding of HTML entity &epsi; Red XN
re U+03B5 ε GREEK SMALL LETTER EPSILON (&epsi;, &epsilon;)
Change: fix by replacing with entity &epsilon; Green tickY before applying decode(). See § Workaround B for code diff & backgrounds; minor comment change
Discussion: (1) reported at T328840, no responses (mw-level); (2) bug report here not challenged
Testcases: See § testcases EPSILON.
DePiep (talk) 06:49, 16 February 2023 (UTC)[reply]
 Done * Pppery * it has begun... 03:11, 19 February 2023 (UTC)[reply]

NBSP behaviour

Leaving this note here.

About NBSP, U+00A0   NO-BREAK SPACE (&nbsp;, &NonBreakingSpace;). With input &nbsp; I am experiencing problems reminding of § epsilon (T328840, now resolved).

When nested like: (replace|s=(decode|s=AB&nbsp;YZ)|replace=AB_YZ) returns breaking code (breaking when used in/with HTML/css code like span, sup, class).

No time to build the reproduction/test, so have to leave it for now. Not reported on phab. DePiep (talk) 07:27, 20 February 2023 (UTC)[reply]

Template-protected edit request on 21 March 2023

Please replace all code Module:DecodeEncode with module:DecodeEncode/sandbox. (compare )

Change: apply require('strict'), and declade function local explicit. DePiep (talk) 14:34, 21 March 2023 (UTC)[reply]

Invitation is out. -DePiep (talk) 14:49, 21 March 2023 (UTC)[reply]
Upd: Gonnym has made large improvements, so the sandboxdiff is large. I do not see strict-related changes. DePiep (talk) 21:31, 21 March 2023 (UTC)[reply]
The changes are good and no globals remain. The two mw.ustring could be string. Johnuniq (talk) 06:40, 22 March 2023 (UTC)[reply]
thx. As said, please someone with trust perform ER because me editing/commenting in between does not help. DePiep (talk) 08:18, 22 March 2023 (UTC)[reply]
 Done — Martin (MSGJ · talk) 18:35, 22 March 2023 (UTC)[reply]

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.