Show simple item record

dc.contributor.authorNguyen, Vu H.
dc.contributor.authorNguyen, Hien T.
dc.contributor.authorDuong, Hieu N.
dc.contributor.authorSnášel, Václav
dc.date.accessioned2017-01-05T07:13:14Z
dc.date.available2017-01-05T07:13:14Z
dc.date.issued2016
dc.identifier.citationComputational Intelligence and Neuroscience. 2016, art. no. 9483646.cs
dc.identifier.issn1687-5265
dc.identifier.issn1687-5273
dc.identifier.urihttp://hdl.handle.net/10084/116564
dc.description.abstractWe propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.cs
dc.format.extent1900833 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoencs
dc.publisherHindawics
dc.relation.ispartofseriesComputational Intelligence and Neurosciencecs
dc.relation.urihttp://dx.doi.org/10.1155/2016/9483646cs
dc.rightsCopyright © 2016 Vu H. Nguyen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.cs
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/cs
dc.titlen-Gram-based text compressioncs
dc.typearticlecs
dc.identifier.doi10.1155/2016/9483646
dc.rights.accessopenAccess
dc.type.versionpublishedVersioncs
dc.type.statusPeer-reviewedcs
dc.description.sourceWeb of Sciencecs
dc.description.firstpageart. no. 9483646cs
dc.identifier.wos000388857100001


Files in this item

This item appears in the following Collection(s)

Show simple item record

Copyright © 2016 Vu H. Nguyen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Except where otherwise noted, this item's license is described as Copyright © 2016 Vu H. Nguyen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.