User:Polygnotus/Scripts/DeduplicateReferencesTest

Test Case 1: Simple Duplicates

Input

== Introduction ==
This is a simple article with duplicate references.<ref>{{cite web|url=https://example.com/article1|title=First Article|date=2024-01-01}}</ref>

== Section 1 ==
More text here.<ref>{{cite web|url=https://example.com/article1|title=First Article|date=2024-01-01}}</ref>

== Section 2 ==
Even more text.<ref>{{cite web|url=https://example.com/article1|title=First Article|date=2024-01-01}}</ref>

== References ==
{{reflist}}

Expected Output

== Introduction ==
This is a simple article with duplicate references.<ref name="firstarticle">{{cite web|url=https://example.com/article1|title=First Article|date=2024-01-01}}</ref>

== Section 1 ==
More text here.<ref name="firstarticle" />

== Section 2 ==
Even more text.<ref name="firstarticle" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 2 references
  • Should generate name from title: "firstarticle"
  • First occurrence gets full ref with name, subsequent get short form

Test Case 2: Cite Templates with Different Types

Input

== Article ==
Climate change is a major issue.<ref>{{cite journal|title=Climate Change Impact|author=Smith, J.|journal=Nature|year=2023|pages=45-60}}</ref>

Scientists agree that action is needed.<ref>{{cite journal|title=Climate Change Impact|author=Smith, J.|journal=Nature|year=2023|pages=45-60}}</ref>

The evidence is overwhelming.<ref>{{cite journal|title=Climate Change Impact|author=Smith, J.|journal=Nature|year=2023|pages=45-60}}</ref>

Another point about weather patterns.<ref>{{cite book|title=Weather Patterns|author=Jones, A.|publisher=Academic Press|year=2022}}</ref>

More discussion of weather.<ref>{{cite book|title=Weather Patterns|author=Jones, A.|publisher=Academic Press|year=2022}}</ref>

== References ==
{{reflist}}

Expected Output

== Article ==
Climate change is a major issue.<ref name="climatechangeimpact">{{cite journal|title=Climate Change Impact|author=Smith, J.|journal=Nature|year=2023|pages=45-60}}</ref>

Scientists agree that action is needed.<ref name="climatechangeimpact" />

The evidence is overwhelming.<ref name="climatechangeimpact" />

Another point about weather patterns.<ref name="weatherpatterns">{{cite book|title=Weather Patterns|author=Jones, A.|publisher=Academic Press|year=2022}}</ref>

More discussion of weather.<ref name="weatherpatterns" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 3 references (2 journal + 1 book)
  • Should generate names: "climatechangeimpact" and "weatherpatterns"

Test Case 3: Mixed Plain Text and Templates

Input

== Mixed References ==
Plain text reference.<ref>This is a plain text reference about something interesting.</ref>

Another mention.<ref>This is a plain text reference about something interesting.</ref>

A citation template.<ref>{{cite web|url=https://news.example.org/story|title=Breaking News Story|date=2024-12-01}}</ref>

Reference to the same news.<ref>{{cite web|url=https://news.example.org/story|title=Breaking News Story|date=2024-12-01}}</ref>

Mixed with plain text again.<ref>This is a plain text reference about something interesting.</ref>

== References ==
{{reflist}}

Expected Output

== Mixed References ==
Plain text reference.<ref name="this_is_a">This is a plain text reference about something interesting.</ref>

Another mention.<ref name="this_is_a" />

A citation template.<ref name="breakingnewsstory">{{cite web|url=https://news.example.org/story|title=Breaking News Story|date=2024-12-01}}</ref>

Reference to the same news.<ref name="breakingnewsstory" />

Mixed with plain text again.<ref name="this_is_a" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 3 references total
  • Plain text gets name from first 3 words: "this_is_a"
  • Template gets name from title: "breakingnewsstory"

Test Case 4: Already Named References (Should Not Modify)

Input

== Already Named ==
This reference is already named.<ref name="smith2023">{{cite journal|title=Important Research|author=Smith|year=2023}}</ref>

Reusing the named reference.<ref name="smith2023" />

A new duplicate reference without a name.<ref>{{cite book|title=Book Title|author=Jones|year=2022}}</ref>

Same reference again, unnamed.<ref>{{cite book|title=Book Title|author=Jones|year=2022}}</ref>

Using the named reference again.<ref name="smith2023" />

== References ==
{{reflist}}

Expected Output

== Already Named ==
This reference is already named.<ref name="smith2023">{{cite journal|title=Important Research|author=Smith|year=2023}}</ref>

Reusing the named reference.<ref name="smith2023" />

A new duplicate reference without a name.<ref name="booktitle">{{cite book|title=Book Title|author=Jones|year=2022}}</ref>

Same reference again, unnamed.<ref name="booktitle" />

Using the named reference again.<ref name="smith2023" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 1 reference (only the unnamed duplicate)
  • Should NOT modify or count the already-named references
  • Named ref "smith2023" and its reuses remain unchanged

Test Case 5: URL-Based References

Input

== URL-Based ==
Reference from BBC.<ref>https://www.bbc.com/news/article-12345</ref>

Another BBC reference.<ref>https://www.bbc.com/news/article-12345</ref>

Reference from The Guardian.<ref>https://www.theguardian.com/world/2024/article</ref>

Same Guardian article.<ref>https://www.theguardian.com/world/2024/article</ref>

Third time for BBC.<ref>https://www.bbc.com/news/article-12345</ref>

== References ==
{{reflist}}

Expected Output

== URL-Based ==
Reference from BBC.<ref name="bbc_com">https://www.bbc.com/news/article-12345</ref>

Another BBC reference.<ref name="bbc_com" />

Reference from The Guardian.<ref name="theguardian_com">https://www.theguardian.com/world/2024/article</ref>

Same Guardian article.<ref name="theguardian_com" />

Third time for BBC.<ref name="bbc_com" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 3 references total
  • Should extract domain names: "bbc_com" and "theguardian_com"
  • www. should be stripped from domain names

Test Case 6: Complex Article with Multiple Reference Types

Input

== Complex Article ==
The theory of relativity was developed by Einstein.<ref>{{cite book|title=Relativity: The Special and General Theory|author=Einstein, Albert|year=1916|publisher=Henry Holt}}</ref>

Einstein's work revolutionized physics.<ref>{{cite book|title=Relativity: The Special and General Theory|author=Einstein, Albert|year=1916|publisher=Henry Holt}}</ref>

== Quantum Mechanics ==
Quantum mechanics is another fundamental theory.<ref>{{cite journal|title=Quantum Theory|author=Bohr, Niels|journal=Physical Review|year=1928|pages=580-590}}</ref>

Bohr made significant contributions.<ref>{{cite journal|title=Quantum Theory|author=Bohr, Niels|journal=Physical Review|year=1928|pages=580-590}}</ref>

== Modern Physics ==
Modern physics combines both theories.<ref>{{cite book|title=Relativity: The Special and General Theory|author=Einstein, Albert|year=1916|publisher=Henry Holt}}</ref>

Some researchers dispute certain interpretations.<ref>https://www.scientificamerican.com/physics-debates/</ref>

Further discussion of disputes.<ref>https://www.scientificamerican.com/physics-debates/</ref>

== Legacy ==
Einstein's legacy continues.<ref>{{cite book|title=Relativity: The Special and General Theory|author=Einstein, Albert|year=1916|publisher=Henry Holt}}</ref>

As does Bohr's influence.<ref>{{cite journal|title=Quantum Theory|author=Bohr, Niels|journal=Physical Review|year=1928|pages=580-590}}</ref>

== References ==
{{reflist}}

Expected Output

== Complex Article ==
The theory of relativity was developed by Einstein.<ref name="relativitythespecial">{{cite book|title=Relativity: The Special and General Theory|author=Einstein, Albert|year=1916|publisher=Henry Holt}}</ref>

Einstein's work revolutionized physics.<ref name="relativitythespecial" />

== Quantum Mechanics ==
Quantum mechanics is another fundamental theory.<ref name="quantumtheory">{{cite journal|title=Quantum Theory|author=Bohr, Niels|journal=Physical Review|year=1928|pages=580-590}}</ref>

Bohr made significant contributions.<ref name="quantumtheory" />

== Modern Physics ==
Modern physics combines both theories.<ref name="relativitythespecial" />

Some researchers dispute certain interpretations.<ref name="scientificamerican_com">https://www.scientificamerican.com/physics-debates/</ref>

Further discussion of disputes.<ref name="scientificamerican_com" />

== Legacy ==
Einstein's legacy continues.<ref name="relativitythespecial" />

As does Bohr's influence.<ref name="quantumtheory" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 6 references total
  • Einstein book appears 4 times → 3 deduplications
  • Bohr journal appears 3 times → 2 deduplications
  • URL appears 2 times → 1 deduplication

Test Case 7: Edge Case - Single Reference (No Deduplication)

Input

== Single Reference ==
This article has only unique references.<ref>{{cite web|url=https://example1.com|title=First Source}}</ref>

Another unique reference.<ref>{{cite web|url=https://example2.com|title=Second Source}}</ref>

And a third unique one.<ref>{{cite web|url=https://example3.com|title=Third Source}}</ref>

== References ==
{{reflist}}

Expected Output

== Single Reference ==
This article has only unique references.<ref>{{cite web|url=https://example1.com|title=First Source}}</ref>

Another unique reference.<ref>{{cite web|url=https://example2.com|title=Second Source}}</ref>

And a third unique one.<ref>{{cite web|url=https://example3.com|title=Third Source}}</ref>

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 0 references
  • No changes to content
  • Script should report "No duplicate references found"

Test Case 8: Whitespace Variations (Should Still Match)

Input

== Whitespace Test ==
Reference with normal spacing.<ref>{{cite web | url=https://example.com | title=Test Article | date=2024-01-01}}</ref>

Same reference with different spacing.<ref>{{cite web|url=https://example.com|title=Test Article|date=2024-01-01}}</ref>

Another with extra spaces.<ref>{{cite web  |  url=https://example.com  |  title=Test Article  |  date=2024-01-01}}</ref>

== References ==
{{reflist}}

Expected Output

== Whitespace Test ==
Reference with normal spacing.<ref name="testarticle">{{cite web | url=https://example.com | title=Test Article | date=2024-01-01}}</ref>

Same reference with different spacing.<ref name="testarticle" />

Another with extra spaces.<ref name="testarticle" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 2 references
  • Whitespace normalization should treat these as identical
  • Name generated from title: "testarticle"

Test Case 9: Self-Closing Named References (Should Not Process)

Input

== Self-Closing Named Refs ==
First occurrence.<ref name="example">{{cite web|url=https://example.com|title=Example}}</ref>

Reuse with self-closing tag.<ref name="example" />

Another reuse.<ref name="example"/>

Duplicate unnamed reference.<ref>{{cite book|title=Book Name|author=Author}}</ref>

Same book again.<ref>{{cite book|title=Book Name|author=Author}}</ref>

== References ==
{{reflist}}

Expected Output

== Self-Closing Named Refs ==
First occurrence.<ref name="example">{{cite web|url=https://example.com|title=Example}}</ref>

Reuse with self-closing tag.<ref name="example" />

Another reuse.<ref name="example"/>

Duplicate unnamed reference.<ref name="bookname">{{cite book|title=Book Name|author=Author}}</ref>

Same book again.<ref name="bookname" />

== References ==
{{reflist}}

Expected Result

  • Should deduplicate 1 reference (only the unnamed book)
  • Should NOT modify the already-named "example" references
  • Should handle both `[1]` and `[1]` formats

Instructions for Testing

  1. Copy a test case input into the Wikipedia edit box
  2. Run the DeduplicateReferences script
  3. Compare the result with the expected output
  4. Verify the deduplication count matches the expected result

Notes

  • The script only deduplicates exact matches (after whitespace normalization)
  • Already-named references are never modified
  • The script generates names from titles (cite templates) or domains (URLs) or first 3 words (plain text)
  • Name conflicts are handled by appending numbers (_1, _2, etc.)
  • Blacklisted names (like "doi_org", "cite_web") are not used
  1. ^ a b Cite error: The named reference x was invoked but never defined (see the help page).

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.