Share to: share facebook share twitter share wa share telegram print page

Run-length encoding

Run-length encoding (RLE) is a form of lossless data compression in which runs of data (consecutive occurrences of the same data value) are stored as a single occurrence of that data value and a count of its consecutive occurrences, rather than as the original run. As an imaginary example of the concept, when encoding an image built up from colored dots, the sequence "green green green green green green green green green" is shortened to "green x 9". This is most efficient on data that contains many such runs, for example, simple graphic images such as icons, line drawings, games, and animations. For files that do not have many runs, encoding them with RLE could increase the file size.

RLE may also refer in particular to an early graphics file format supported by CompuServe for compressing black and white images, that was widely supplanted by their later Graphics Interchange Format (GIF).

RLE also refers to a little-used image format in Windows 3.x that is saved with the file extension rle; it is a run-length encoded bitmap, and the format was used for the Windows 3.x startup screen.

History and applications

Run-length encoding (RLE) schemes were employed in the transmission of analog television signals as far back as 1967.[1] In 1983, run-length encoding was patented by Hitachi.[2][3][4] RLE is particularly well suited to palette-based bitmap images (which use relatively few colours) such as computer icons, and was a popular image compression method on early online services such as CompuServe before the advent of more sophisticated formats such as GIF.[5] It does not work well on continuous-tone images (which use very many colours) such as photographs, although JPEG uses it on the coefficients that remain after transforming and quantizing image blocks.

Common formats for run-length encoded data include Truevision TGA, PackBits (by Apple, used in MacPaint), PCX and ILBM. The International Telecommunication Union also describes a standard to encode run-length colour for fax machines, known as T.45.[6] That fax colour coding standard, which along with other techniques is incorporated into Modified Huffman coding,[citation needed] is relatively efficient because most faxed documents are primarily white space, with occasional interruptions of black.

Algorithm

RLE has a space complexity of , where n is the size of the input data.

Encoding algorithm

Run-length encoding compresses data by reducing the physical size of a repeating string of characters. This process involves converting the input data into a compressed format by identifying and counting consecutive occurrences of each character. The steps are as follows:

  1. Traverse the input data.
  2. Count the number of consecutive repeating characters (run length).
  3. Store the character and its run length.

Python implementation

Imports and helper functions
from itertools import repeat, compress, groupby


def ilen(iterable):
    """
    Return the number of items in iterable.

    >>> ilen(x for x in range(1000000) if x % 3 == 0)
    333334
    """
    # using zip() to wrap the input with 1-tuples which compress() reads as true values.
    return sum(compress(repeat(1), zip(iterable)))
def rle_encode(iterable, *, length_first=True):
    """
    >>> "".join(rle_encode("AAAABBBCCDAA"))
    '4A3B2C1D2A'
    >>> "".join(rle_encode("AAAABBBCCDAA", length_first=False))
    'A4B3C2D1A2'
    """
    return (
        f"{ilen(g)}{k}" if length_first else f"{k}{ilen(g)}" # ilen(g): length of iterable g
        for k, g in groupby(iterable)
    )

[7]

Decoding algorithm

The decoding process involves reconstructing the original data from the encoded format by repeating characters according to their counts. The steps are as follows:

  1. Traverse the encoded data.
  2. For each count-character pair, repeat the character count times.
  3. Append these characters to the result string.

Python implementation

Imports
from itertools import chain, repeat, batched
def rle_decode(iterable, *, length_first=True):
    """
    >>> "".join(rle_decode("4A3B2C1D2A"))
    'AAAABBBCCDAA'
    >>> "".join(rle_decode("A4B3C2D1A2", length_first=False))
    'AAAABBBCCDAA'
    """
    return chain.from_iterable(
        repeat(b, int(a)) if length_first else repeat(a, int(b))
        for a, b in batched(iterable, 2)
    )

[7]

Example

Consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. A hypothetical scan line, with B representing a black pixel and W representing white, might read as follows:

WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW

With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows:

12W1B12W3B24W1B14W


This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc., and represents the original 67 characters in only 18. While the actual format used for the storage of images is generally binary rather than ASCII characters like this, the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as DEFLATE often use LZ77-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as BWWBWWBWWBWW).

Run-length encoding can be expressed in multiple ways to accommodate data properties as well as additional compression algorithms. For instance, one popular method encodes run lengths for runs of two or more characters only, using an "escape" symbol to identify runs, or using the character itself as the escape, so that any time a character appears twice it denotes a run. On the previous example, this would give the following:

WW12BWW12BB3WW24BWW14

This would be interpreted as a run of twelve Ws, a B, a run of twelve Ws, a run of three Bs, etc. In data where runs are less frequent, this can significantly improve the compression rate.

One other matter is the application of additional compression algorithms. Even with the runs extracted, the frequencies of different characters may be large, allowing for further compression; however, if the run lengths are written in the file in the locations where the runs occurred, the presence of these numbers interrupts the normal flow and makes it harder to compress. To overcome this, some run-length encoders separate the data and escape symbols from the run lengths, so that the two can be handled independently. For the example data, this would result in two outputs, the string "WWBWWBBWWBWW" and the numbers (12,12,3,24,14).

Variants

  • Sequential RLE: This method processes data one line at a time, scanning from left to right. It is commonly employed in image compression. Other variations of this technique include scanning the data vertically, diagonally, or in blocks.
  • Lossy RLE: In this variation, some bits are intentionally discarded during compression (often by setting one or two significant bits of each pixel to 0). This leads to higher compression rates while minimally impacting the visual quality of the image.
  • Adaptive RLE: Uses different encoding schemes depending on the length of runs to optimize compression ratios. For example, short runs might use a different encoding format than long runs.

See also

References

  1. ^ Robinson, A. H.; Cherry, C. (1967). "Results of a prototype television bandwidth compression scheme". Proceedings of the IEEE. 55 (3). IEEE: 356–364. doi:10.1109/PROC.1967.5493.
  2. ^ "Run Length Encoding Patents". Internet FAQ Consortium. 21 March 1996. Retrieved 14 July 2019.
  3. ^ "Method and system for data compression and restoration". Google Patents. 7 August 1984. Retrieved 14 July 2019.
  4. ^ "Data recording method". Google Patents. 8 August 1983. Retrieved 14 July 2019.
  5. ^ Dunn, Christopher (1987). "Smile! You're on RLE!" (PDF). The Transactor. 7 (6). Transactor Publishing: 16–18. Retrieved 2015-12-06.
  6. ^ Recommendation T.45 (02/00): Run-length colour encoding. International Telecommunication Union. 2000. Retrieved 2015-12-06.
  7. ^ a b "more-itertools 10.4.0 documentation". August 2024.

Read other articles:

Jalan papan (棧道) atau dalam bahasa Inggris dinamakan gallery roads adalah jalan melalui daerah pegunungan terpencil Tiongkok. Jalan ini terbuat dari papan kayu yang didirikan di atas lubang yang dipotong pada sisi tebing. Terutama digunakan di Pegunungan Qin yang menghubungkan lembah Sungai Wei dan Sungai Han. Jalan papan pertama dibangun selama Periode Negara Berperang (476-221 SM) dan telah digunakan oleh negara Qin[1] termasuk untuk menyerang negara Shu serta Ba. Jalan ini sepenuh…

Etena Nama Nama IUPAC Ethene Penanda Nomor CAS 74-85-1 Y Model 3D (JSmol) Gambar interaktif 3DMet {{{3DMet}}} ChEBI CHEBI:18153 Y ChEMBL ChEMBL117822 Y ChemSpider 6085 Y Nomor EC KEGG C06547 Y PubChem CID 6325 Nomor RTECS {{{value}}} UNII 91GW059KN7 Y CompTox Dashboard (EPA) DTXSID1026378 InChI InChI=1S/C2H4/c1-2/h1-2H2 YKey: VGGSQFUCUMXWEO-UHFFFAOYSA-N YInChI=1/C2H4/c1-2/h1-2H2Key: VGGSQFUCUMXWEO-UHFFFAOYAE SMILES C=C Sifat Rumus kimia C2H4 …

Часть серии статей о Холокосте Идеология и политика Расовая гигиена · Расовый антисемитизм · Нацистская расовая политика · Нюрнбергские расовые законы Шоа Лагеря смерти Белжец · Дахау · Майданек · Малый Тростенец · Маутхаузен · …

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Oktober 2022. GPIB Immanuel MedanGereja Protestan di Indonesia bagian BaratGedung Gereja Immanuel, MedanLokasiKota Medan, IndonesiaDenominasiCalvinisArsitekturStatus fungsionalAktifPenetapan warisanATipe arsitekturGereja Cagar budaya IndonesiaGereja ImmanuelPeringkatNa…

يتزايد معدل مشاركة النساء في الألعاب الأولمبية منذ مشاركتهن الأولى في عام 1900. بعض الألعاب الرياضية مخصصة للنساء، والبعض الآخر يتنافس عليه كلا الجنسين، بينما تبقى بعض الألعاب الرياضية القديمة للرجال فقط. تُظهر دراسات التغطية الإعلامية للأولمبياد باستمرار الاختلافات في طرق…

Bombardier Challenger 850 adalah sebuah pesawat bisnis terbesar super-mid size yang ditawarkan oleh Bombardier Aerospace. Pesawat bisnis ini didasarkan pada Bombardier CRJ200LR 50-kursi. Bombardier Challenger 850 adalah versi diperbarui saat ini. Challenger 850 berasal dari pesawat Bombardier CRJ200. Pesawat ini mampu menampung 15-19 penumpang. Challenger 850 jet memiliki jarak antar benua dan kecepatan tinggi Mach 0,80. Referensi lbsPesawat Bombardier AerospaceDaftar pesawat Bombardier Aerospac…

Process of extracting tin from the ground Tin mining began early in the Bronze Age, as bronze is a copper-tin alloy. Tin is a relatively rare element in the Earth's crust, with approximately 2 ppm (parts per million), compared to iron with 50,000 ppm. History Main article: Tin sources and trade in ancient times Tin extraction and use can be dated to the beginnings of the Bronze Age around 3000 BC, when it was observed that copper objects formed of polymetallic ores with different metal contents …

The Emory WheelTypeStudent newspaperFormatBroadsheetOwner(s)Independently financed and operatedEditor-in-chiefMadi Olivier and Sophia PeyserFounded1919HeadquartersAtlanta, GeorgiaUSAWebsitewww.emorywheel.com The Emory Wheel is the independent, student-run newspaper at Emory University in Atlanta, Georgia. The Wheel is published every other week on Wednesday during the regular school year, and is updated daily on its website. The sections of the Wheel include News, Opinion, Sports, Arts & Ent…

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada November 2022. Arif YunusovLahir12 Januari 1955 (umur 69)Baku, AzerbaijanKebangsaanAzerbaijanPekerjaanPenulis, sejarawan, dan aktivis hak asasi manusiaOrganisasiInstitut Perdamaian dan DemokrasiDikenal atasAktivisme hak asasi manusia Arif Yunusov (lahir 12 Januari…

6th season of the top-tier football league in Uruguay Football league seasonPrimera DivisiónSeason1906ChampionsMontevideo WanderersMatches played30Goals scored75 (2.5 per match)Top goalscorer Rafael de Miquelenera (6) (Montevideo Wanderers)← 1905 1907 → The 1905 Primera División was the 5th. season of top-flight football in Uruguay. Overview The tournament consisted of a round-robin championship. It involved six teams, and the champion was Montevideo Wanderers, being the first time th…

Failed rocket launch Vanguard SLV-3Vanguard rocket on LC-18A prior to its launchNamesVanguard Space Launch Vehicle-3Mission typeInternational Geophysical YearOperatorNaval Research LaboratoryMission durationFailed to orbit Spacecraft propertiesSpacecraftVanguard 2DSpacecraft typeVanguardManufacturerNaval Research LaboratoryLaunch mass10.6 kg (23 lb)Dimensions50.8 cm (20.0 in) of diameter Start of missionLaunch date26 September 1958, 15:38 GMTRocketVanguard SLV-3Launch siteCap…

You can help expand this article with text translated from the corresponding article in Hungarian. (December 2009) Click [show] for important translation instructions. View a machine-translated version of the Hungarian article. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-translated text into the English Wi…

Season of television series Teen WolfSeason 6DVD cover for both partsStarring Tyler Posey Dylan O'Brien Holland Roden Shelley Hennig Dylan Sprayberry Linden Ashby Melissa Ponzio JR Bourne No. of episodes20ReleaseOriginal networkMTVOriginal releaseNovember 15, 2016 (2016-11-15) –September 24, 2017 (2017-09-24)Season chronology← PreviousSeason 5List of episodes The sixth and final season of Teen Wolf, an American supernatural drama created by Jeff Davis and to some extent b…

National Football League franchise in Nashville, Tennessee Tennessee Titans Current seasonEstablished August 14, 1959; 64 years ago (August 14, 1959)[1]First season: 1960Play in Nissan StadiumNashville, TennesseeHeadquartered in Ascension Saint Thomas Sports ParkNashville, Tennessee[2] Tennessee Titans logoTennessee Titans wordmarkLogoWordmarkLeague/conference affiliations American Football League Eastern Division (1960–1969) National Football League (1970–pr…

Artikel ini membutuhkan rujukan tambahan agar kualitasnya dapat dipastikan. Mohon bantu kami mengembangkan artikel ini dengan cara menambahkan rujukan ke sumber tepercaya. Pernyataan tak bersumber bisa saja dipertentangkan dan dihapus.Cari sumber: Ol Doinyo Lengai – berita · surat kabar · buku · cendekiawan · JSTOR (Oktober 2021) Ol Doinyo Lengai (Oldoinyo Lengai)Titik tertinggiKetinggian3.188 m (10.459 ft)[1]Koordinat02°45′52″S 35…

† Египтопитек Реконструкция внешнего вида египтопитека Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:Четвероноги…

Pour les articles homonymes, voir Hoste. Cet article est une ébauche concernant un coureur cycliste belge. Vous pouvez partager vos connaissances en l’améliorant (comment ?). Pour plus d’informations, voyez le projet cyclisme. Frank HosteInformationsNaissance 29 août 1955 (68 ans)GandNationalité belgeÉquipes professionnelles 08.1977-12.1977[n 1]IJsboerke-Colnago1978IJsboerke-Gios1979Marc Zeep Savon-Superia1980Marc-Carlos-VRD-Woningbouw1981TI-Raleigh-Creda1982TI-Raleigh-Campagn…

本條目存在以下問題,請協助改善本條目或在討論頁針對議題發表看法。 此條目需要补充更多来源。 (2017年12月19日)请协助補充多方面可靠来源以改善这篇条目,无法查证的内容可能會因為异议提出而被移除。致使用者:请搜索一下条目的标题(来源搜索:若望保祿二世 — 网页、新闻、书籍、学术、图像),以检查网络上是否存在该主题的更多可靠来源(判定指引)。 此…

Герберт Гольновнем. Herbert Gollnow Дата рождения 13 июля 1911(1911-07-13) Место рождения Берлин, Германская империя Дата смерти 12 февраля 1943(1943-02-12) (31 год) Место смерти Берлин, Германия Страна  Германский рейх Род деятельности член движения Сопротивления во время Второй мир…

هنودمعلومات عامةنسبة التسمية الهند التعداد الكليالتعداد قرابة 1.21 مليار[1][2]تعداد الهند عام 2011ق. 1.32 مليار[3]تقديرات عام 2017ق. 30.8 مليون[4]مناطق الوجود المميزةبلد الأصل الهند البلد الهند  الهند نيبال 4,000,000[5] الولايات المتحدة 3,982,398[6] الإمارا…

Kembali kehalaman sebelumnya