1
0
Fork 0
mirror of https://github.com/codedread/bitjs synced 2025-10-03 17:49:16 +02:00
bitjs/docs/bitjs.archive.md
2024-01-04 22:23:46 +09:00

177 lines
6.6 KiB
Markdown

# bitjs.archive
This package includes objects for unarchiving binary data in popular archive formats (zip, rar, tar)
providing unzip, unrar and untar capabilities via JavaScript in the browser or various JavaScript
runtimes (node, deno, bun).
A prototype version of a compressor that creates Zip files is also present. The decompression /
compression happens inside a Web Worker, if the runtime supports it (browsers, deno).
The API is event-based, you will want to subscribe to some of these events:
* 'progress': Periodic updates on the progress (bytes processed).
* 'extract': Sent whenever a single file in the archive was fully decompressed.
* 'finish': Sent when decompression/compression is complete.
## Decompressing
### Simple Example of unzip
Here is a simple example of unzipping a file. It is assumed the zip file exists as an
[`ArrayBuffer`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer),
which you can get via
[`XHR`](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest_API/Sending_and_Receiving_Binary_Data),
from a [`Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob/arrayBuffer),
[`Fetch`](https://developer.mozilla.org/en-US/docs/Web/API/Response/arrayBuffer),
[`FileReader`](https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsArrayBuffer),
etc.
```javascript
import { Unzipper } from './bitjs/archive/decompress.js';
const unzipper = new Unzipper(zipFileArrayBuffer);
unzipper.addEventListener('extract', (evt) => {
const {filename, fileData} = evt.unarchivedFile;
console.log(`unzipped ${filename} (${fileData.byteLength} bytes)`);
// Do something with fileData...
});
unzipper.addEventListener('finish', () => console.log(`Finished!`));
unzipper.start();
```
`start()` is an async method that resolves a `Promise` when the unzipping is complete, so you can
`await` on it, if you need to.
### Progressive unzipping
The unarchivers also support progressively decoding while streaming the file, if you are receiving
the zipped file from a slow place (a Cloud API, for instance). Send the first `ArrayBuffer` in the
constructor, and send subsequent `ArrayBuffers` using the `update()` method.
```javascript
import { Unzipper } from './bitjs/archive/decompress.js';
const unzipper = new Unzipper(anArrayBufferWithStartingBytes);
unzipper.addEventListener('extract', () => {...});
unzipper.addEventListener('finish', () => {...});
unzipper.start();
...
// after some time
unzipper.update(anArrayBufferWithMoreBytes);
...
// after some more time
unzipper.update(anArrayBufferWithYetMoreBytes);
```
### getUnarchiver()
If you don't want to bother with figuring out if you have a zip, rar, or tar file, you can use the
convenience method `getUnarchiver()`, which sniffs the bytes for you and creates the appropriate
unarchiver.
```javascript
import { getUnarchiver } from './bitjs/archive/decompress.js';
const unarchiver = getUnarchiver(anArrayBuffer);
unarchive.addEventListener('extract', () => {...});
// etc...
unarchiver.start();
```
### Non-Browser JavaScript Runtime Examples
The API works in other JavaScript runtimes too (Node, Deno, Bun).
#### NodeJS
```javascript
import * as fs from 'fs';
import { getUnarchiver } from './archive/decompress.js';
const nodeBuf = fs.readFileSync('comic.cbz');
// NOTE: Small files may not have a zero byte offset in Node, so we slice().
// See https://nodejs.org/api/buffer.html#bufbyteoffset.
const ab = nodeBuf.buffer.slice(nodeBuf.byteOffset, nodeBuf.byteOffset + nodeBuf.length);
const unarchiver = getUnarchiver(ab);
unarchiver.addEventListener('progress', () => process.stdout.write('.'));
unarchiver.addEventListener('extract', (evt) => {
const {filename, fileData} = evt.unarchivedFile;
console.log(`${filename} (${fileData.byteLength} bytes)`);
});
unarchiver.addEventListener('finish', () => console.log(`Done!`));
unarchiver.start();
```
#### Deno
```typescript
import { UnarchiveExtractEvent } from './archive/events.js';
import { getUnarchiver} from './archive/decompress.js';
const print = (s: string) => Deno.writeAll(Deno.stdout, new TextEncoder().encode(s));
async function go() {
const arr: Uint8Array = await Deno.readFile('example.zip');
const unarchiver = getUnarchiver(arr.buffer);
unarchiver.addEventListener('extract', (evt) => {
const {filename, fileData} = (evt as UnarchiveExtractEvent).unarchivedFile;
print(`\n${filename} (${fileData.byteLength} bytes)\n`);
// Do something with fileData...
});
unarchiver.addEventListener('finish', () => { console.log(`Done!`); Deno.exit(); });
unarchiver.addEventListener('progress', (evt) => print('.'));
unarchiver.start();
}
await go();
```
## Compressing
The Zipper only supports creating zip files without compression (store only) for now. The interface
is pretty straightforward and there is no event-based / streaming API.
```javascript
import { Zipper } from './bitjs/archive/compress.js';
const zipper = new Zipper();
const now = Date.now();
// Create a zip file with files foo.jpg and bar.txt.
const zippedArrayBuffer = await zipper.start(
[
{
fileName: 'foo.jpg',
lastModTime: now,
fileData: fooArrayBuffer,
},
{
fileName: 'bar.txt',
lastModTime: now,
fileData: barArrayBuffer,
}
],
true /* isLastFile */);
```
## Implementation Details
All you generally need to worry about is calling getUnarchiver(), listen for events, and then `start()`. However, if you are interested in how it works under the covers, read on...
The implementations are written in pure JavaScript and communicate with the host software (the thing that wants to do the unzipping) via a MessageChannel. The host and implementation each own a MessagePort and pass messages to each other through it. In a web browser, the implementation is invoked as a Web Worker to save the main UI thread from getting the CPU spins.
```mermaid
sequenceDiagram
participant Host Code
participant Port1
box Any JavaScript Context (could be a Web Worker)
participant Port2
participant unrar.js
end
Host Code->>Port1: postMessage(rar bytes)
Port1-->>Port2: (MessageChannel)
Port2->>unrar.js: onmessage(rar bytes)
Note right of unrar.js: unrar the thing
unrar.js->>Port2: postMessage(an extracted file)
Port2-->>Port1: (MessageChannel)
Port1->>Host Code: onmessage(an extracted file)
unrar.js->>Port2: postMessage(2nd extracted file)
Port2-->>Port1: (MessageChannel)
Port1->>Host Code: onmessage(2nd extracted file)
```