865 lines
36 KiB
Markdown
865 lines
36 KiB
Markdown
# books
|
|
|
|
[B]ooks - which is only one of the names this program goes by - is a front-end for accessing a locally accessible libgen / libgen_fiction database instance, offering versatile search and download directly from the command line. The included `update_libgen` tool is used to keep the database up to date - if the database is older than a user-defined value it is updated before the query is executed. This generally only takes a few seconds, but it might take longer on a slow connection or after a long update interval. Updating can be temporarily disabled by using the '-x' command line option. To refresh the database(s) from a dump file use the included `refresh_libgen program`, see 'update_libgen vs refresh_libgen' below for more information on which tool to use.
|
|
|
|
Books comes in three main flavours:
|
|
|
|
* `books` / `books-all` / `fiction`: CLI search interface which dumps results to the terminal, download through MD5
|
|
* `nbook` / `nfiction`: text-based browser offering limited preview and download
|
|
* `xbook` / `xfiction`: gui-based browser offering preview and download
|
|
|
|
|
|
The *book* tools are based on the *libgen* database, the *fiction* tools use the *libgen_fiction* database. Apart from the fact that the *fiction* tools do not support all the search criteria offered by the 'book' tools due to differences in the database layout, all programs share the same interface.
|
|
|
|
The database can be searched in two modes, per-field (the default) and fulltext (which, of course, only searches book metadata, not the actual book contents). The current implementation for fulltext search is actually a pattern match search on a number of concatenated database columns, it does not use MySQL's native fulltext search. The advantage of this implementation is that it does not need a full-text index (which is not part of the libgen dump and would need to be generated locally), the disadvantage is that it does not offer more advanced natural language search options. Given the limited amount of 'natural language' available in the database the latter does not seem to be much of a disadvantage and the implementation performs well.
|
|
|
|
In the (default) per-field search mode the database can be searched for patterns (SQL 'like' operator with leading and trailing wildcards) using lower-case options and/or exact matches using upper-case options. The fulltext search by necessity always uses pattern matching over the indicated fields ('title' and 'author' if no other fields are specified).
|
|
|
|
Publications can be downloaded using IPFS, through torrents or from libgen download mirror servers by selecting them in the result list or by using the 'Download' button in the preview window, the `books` and `fiction` tools can be used to download publications based on their MD5 hash (use `-J ...`). When using the gui-based tools in combination with the 'yad' tool, double-clicking a row in the result list shows a preview, the other tools generate previews for selected publications using the '-w' command line option.
|
|
|
|
See [Installation](#installation) for information on how to install *books*.
|
|
|
|
## How to use *books* et al.
|
|
|
|
I'll let the programs themselves do the talking:
|
|
|
|
```txt
|
|
$ books -h
|
|
books version 0.7
|
|
|
|
Use: books OPTIONS [like] [<PATTERN>]
|
|
|
|
(...)
|
|
|
|
SEARCH BY FIELD:
|
|
|
|
This is the default search mode. If no field options are given this searches
|
|
the Title field for the PATTERN. Capital options (-A, -T, etc) for exact match,
|
|
lower-case (-a, -t, etc) for pattern match.
|
|
|
|
FULLTEXT SEARCH (-f):
|
|
|
|
Performs a pattern match search over all fields indicated by the options. If no
|
|
field options are given, perform a pattern match search over the Author and
|
|
Title fields.
|
|
|
|
Depending on which name this program is executed under it behaves differently:
|
|
|
|
books: query database and show results, direct download with md5
|
|
books-all: query database and show results (exhaustive search over all tables, slow)
|
|
|
|
nbook: select publications for download from list (terminal-based)
|
|
xbook: select publications for download from list (GUI)
|
|
|
|
fiction: query database and show results (using 'fiction' database), direct download with md5
|
|
|
|
nfiction: select publications for download from list (terminal-based, use 'fiction' database)
|
|
xfiction: select publications for download from list (GUI, use 'fiction' database)
|
|
|
|
OPTIONS
|
|
|
|
-z, -Z search on LOCATOR
|
|
-y, -Y search on YEAR
|
|
-v, -V search on VOLUMEINFO
|
|
-t, -T search on TITLE
|
|
-s, -S search on SERIES
|
|
-r, -R search on PERIODICAL
|
|
-q, -Q search on OPENLIBRARYID
|
|
-p, -P search on PUBLISHER
|
|
-o, -O search on TOPIC_DESCR
|
|
-n, -N search on ASIN
|
|
-m search on MD5
|
|
-l, -L search on LANGUAGE
|
|
-i, -I search on ISSN
|
|
-g, -G search on TAGS
|
|
-e, -E search on EXTENSION
|
|
-d, -D search on EDITION
|
|
-c, -C search on CITY
|
|
-b, -B search on IDENTIFIERWODASH
|
|
-a, -A search on AUTHOR
|
|
|
|
-f fulltext search
|
|
searches for the given words in the fields indicated by the other options.
|
|
when no other options are given this will perform a pattern match search
|
|
for the given words over the Author and Title fields.
|
|
|
|
-w preview publication info before downloading (cover preview only in GUI tools)
|
|
select one or more publication to preview and press enter/click OK.
|
|
|
|
double-clicking a result row also shows a preview irrespective of this option,
|
|
but this only works when using the yad gui tool
|
|
|
|
-= DIR set download location to DIR
|
|
|
|
-$ use extended path when downloading:
|
|
nonfiction/[topic/]author[/series]/title
|
|
fiction/language/author[/series]/title
|
|
|
|
-u BOOL use bittorrent (-u 1 or -u y) or direct download (-u 0 or -u n)
|
|
this parameter overrides the default download method
|
|
bittorrent download depends on an external helper script
|
|
to interface with a bittorrent client
|
|
|
|
-I BOOL use ipfs (-I 1 or -I y) or direct download (-I 0 or -I n)
|
|
this parameter overrides the default download method
|
|
ipfs download depends on a functioning ipfs gateway.
|
|
default gateway is hosted by Cloudfront, see https://ipfs.io/
|
|
for instructions on how to run a local gateway
|
|
|
|
-U MD5 print torrent path (torrent#/md5) for given MD5
|
|
|
|
-j MD5 print filename for given MD5
|
|
|
|
-J MD5 download file for given MD5
|
|
can be combined with -u to download with bittorrent
|
|
|
|
-M MD5 fast path search on md5, only works in _books_ and _fiction_
|
|
can be combined with -F FIELDS to select fields to be shown
|
|
output goes directly to the terminal (no pager)
|
|
|
|
-F FIELDS select which fields to show in pager output
|
|
|
|
-# LIMIT limit search to LIMIT hits (default: 1000)
|
|
|
|
-x skip database update
|
|
(currently only the 'libgen' database can be updated)
|
|
|
|
-@ TORPORT use torsocks to connect to the libgen server(s). You'll need to install
|
|
torsocks before using this option; try this in case your ISP
|
|
(or a transit provider somewhere en-route) blocks access to libgen
|
|
|
|
-k install symlinks for all program invocations
|
|
|
|
-h show this help message
|
|
|
|
EXAMPLES
|
|
|
|
Do a pattern match search on the Title field for 'ilias' and show the results in the terminal
|
|
|
|
$ books like ilias
|
|
|
|
|
|
Do an exact search on the Title field for 'The Odyssey' and show the results in the terminal
|
|
|
|
$ books 'the odyssey'
|
|
|
|
|
|
Do an exact search on the Title field for 'The Odyssey' and the Author field for 'Homer', showing
|
|
the result in the terminal
|
|
|
|
$ books -T 'The Odyssey' -A 'Homer'
|
|
|
|
|
|
Do the same search as above, showing the results in a list on the terminal with checkboxes to select
|
|
one or more publications for download
|
|
|
|
$ nbook -T 'The Odyssey' -A 'Homer'
|
|
|
|
|
|
A case-insensitive pattern search using an X11-based interface; use bittorrent (-u y or -u 1) when downloading files
|
|
|
|
$ xbook -u y -t 'the odyssey' -a 'homer'
|
|
|
|
|
|
Do a fulltext search over the Title, Author, Series, Periodical and Publisher fields, showing the
|
|
results in a terminal-based checklist for download after preview (-w)
|
|
|
|
$ nbook -w -f -t -a -s -r -p 'odyssey'
|
|
|
|
|
|
Walk over a directory of publications, compute md5 and use this to generate file names:
|
|
|
|
$ find /path/to/publications -type f|while read f; do books -j $(md5sum "$f"|awk '{print $1}');done
|
|
|
|
|
|
As above, but print torrent number and path in torrent file
|
|
|
|
$ find /path/to/publications -type f|while read f; do books -U $(md5sum "$f"|awk '{print $1}');done
|
|
|
|
|
|
Find publications by author 'thucydides' and show their md5,title and year in the terminal
|
|
|
|
$ books -a thucydides -F md5,title,year
|
|
|
|
|
|
Get data on a single publication using fast path MD5 search, show author, title and extension
|
|
|
|
$ books -M 51b4ee7bc7eeb6ed7f164830d5d904ae -F author,title,extension
|
|
|
|
|
|
Download a publication using its MD5 (-J MD5), using bittorrent (-u y or -u 1) to download
|
|
|
|
$ books -u y -J 51b4ee7bc7eeb6ed7f164830d5d904ae
|
|
|
|
```
|
|
|
|
```txt
|
|
$ update_libgen -h
|
|
update_libgen version 0.6
|
|
|
|
Usage: update_libgen OPTIONS
|
|
|
|
-l LIMIT get updates in blocks of LIMIT entries
|
|
-v be verbose about what is being updated; repeat for more verbosity:
|
|
-v: show basic info (number of updates, etc)
|
|
-vv: show ID, Title and TimeLastModified for each update
|
|
-n do not update database. Use together with -v or -vv to show
|
|
how many (-v) and which (-vv) titles would be updated.
|
|
-j FILE dump (append) json to FILE
|
|
-s FILE dump (append) sql to FILE
|
|
-u URL use URL to access the libgen API (overrides default)
|
|
-t DATETIME get updates since DATETIME (ignoring TimeLastModified in database)
|
|
use this option together with -s to create an sql update file to update
|
|
non-networked machines
|
|
-i ID get updates from ID
|
|
|
|
-H DBHOST database host
|
|
-P DBPORT database port
|
|
-U DBUSER database user
|
|
-D DATABASE database name
|
|
|
|
-a APIHOST use APIHOST as API server
|
|
-@ TORPORT use tor (through torsocks) to connect to libgen API server
|
|
-c run classify over new records to get classification data
|
|
-q don't warn about missing fields in database or api response
|
|
-h this help message
|
|
```
|
|
|
|
```txt
|
|
$ refresh_libgen -h
|
|
refresh_libgen version 0.6.1
|
|
|
|
Usage: refresh_libgen OPTIONS
|
|
|
|
Performs a refresh from a database dump file for the chosen libgen databases.
|
|
|
|
-n do not refresh database
|
|
use together with '-v' to check if recent dumps are available
|
|
-f force refresh, use this on first install
|
|
-v be verbose about what is being updated
|
|
-d DAYS only use database dump files no older than DAYS days (default: 5)
|
|
-u DBS refresh DBS databases (default: compact fiction libgen)
|
|
|
|
-H DBHOST database host (localhost)
|
|
-P DBPORT database port (3306)
|
|
-U DBUSER database user (libgen)
|
|
-R REPO dump repository (http://gen.lib.rus.ec/dbdumps/)
|
|
-c create a config file using current settings (see -H, -P, -U, -R)
|
|
-e edit config file
|
|
|
|
-@ TORPORT use tor (through torsocks) to connect to libgen server
|
|
-k keep downloaded files after exit
|
|
-h this help message
|
|
```
|
|
|
|
## IPFS, Torrents, direct download...
|
|
|
|
*Books* (et al) can download files either through IPFS (using `-I 1` or `-I y`), from torrents (using `-u y` or `-u 1`) or from one of the libgen download mirrors (default, use `-I n`/`-u n` or `-I 0`/`-u 0` in case IPFS or torrent download is set as default). To limit the load on the download servers it is best to use IPFS or torrents whenever possible. The latest publications are not yet available through IPFS or torrents since those are only created for batches of 1000 publications. The feasibility of torrent download also depends on whether the needed torrents are seeded while for IPFS download a working IPFS gateway is needed. Publications which can not be downloaded through IPFS or torrents can be downloaded directly.
|
|
|
|
### IPFS download process
|
|
IPFS download makes use of an IPFS gateway, by default this is set to Cloudflare's gateway:
|
|
|
|
```
|
|
# ipfs gateway
|
|
ipfs_gw="https://cloudflare-ipfs.com"
|
|
```
|
|
|
|
This can be changed in the config file (usually `$HOME/.config/books.conf`)
|
|
|
|
The actual download works exactly the same as the direct download, only the source is changed from a direct download server to the IPFS gateway. Download speed depends on whether the gateway has the file in cache or not, in the latter case it can take a bit more time - be patient.
|
|
|
|
### Torrent download process
|
|
Torrent download works by selecting individual files for download from the 'official' torrents, i.e. it is *not* necessary to download the whole torrent for a single publication. This process is automated by means of a helper script which is used to interface *books* with a torrent client. Currently the only torrent client for which a helper script is available is *transmission-daemon*, the script uses the related *transmission-remote* program to interface with the daemon. Writing a helper script should not be that hard for other torrent clients as long as these can be controlled through the command line or via an API.
|
|
|
|
When downloading through torrents *books* first tries to download the related torrent file from the 'official' repository, if this fails it gives up and suggests using direct download instead. Once the torrent file has been downloaded it is checked to see whether it contains the required file. If this check passes the torrent is submitted to the torrent client with only the required file selected for download. A job script is created which can be used to control the torrent job, if the `torrent_cron_job` parameter in the PREFERENCES section or the config file is set to `1` it is submitted as a cron job. The task of this script is to copy the downloaded file from the torrent client download directory (`torrent_download_directory` in books.conf or the PREFERENCES section) to the target directory (preference `target_directory`) under the correct name. Once the torrent has finished downloading the job script will copy the file to that location and remove the cron job. If `torrent_cron_job` is not set (or is set to `0`) the job script can be called 'by hand' to copy the file, it can also be used to perform other tasks like retrying the download from a libgen download mirror server (use `-D`, this will cancel the torrent and cron job for this file) or to retry the torrent download (use `-R`). The script has the following options:
|
|
|
|
```txt
|
|
$ XYZ.job -h
|
|
Use: bash jobid.job [-s] -[i] [-r] [-R] [-D] [-h] [torrent_download_directory]
|
|
|
|
Copies file from libgen/libgen_fiction torrent to correct location and name
|
|
|
|
-S show job status
|
|
-s show torrent status (short)
|
|
-i show torrent info (long)
|
|
-I show target file name
|
|
-r remove torrent and cron jobs
|
|
-R restart torrent download (does not restart cron job)
|
|
-D direct download (removes torrent and cron jobs)
|
|
-h show this help message
|
|
```
|
|
|
|
### The torrent helper script interface
|
|
The torrent helper script (here named `ttool`) needs to support the following commands:
|
|
|
|
* `ttool add-selective <torrent_file> <md5>`
|
|
download file `<md5>` from torrent `<torrent_file>`
|
|
* `ttool torrent-hash <torrent_file>`
|
|
get btih (info-hash) for `<torrent_file>`
|
|
* `ttool torrent-files <torrent_file>`
|
|
list files in `<torrent_file>`
|
|
* `ttool remove <btih>`
|
|
remove active torrent with info-hash `<btih>`
|
|
* `ttool ls <btih>`
|
|
show download status for active torrent with info-hash `<btih>`
|
|
* `ttool info <btih>`
|
|
show extensive info (files, peers, etc) for torrent with info-hash `<btih>`
|
|
* `ttool active <btih>`
|
|
return `true` if the torrent is active, `false` otherwise
|
|
|
|
Output should be the requested data without any headers or other embellishments. Here is an example using the (included) `tm` helper script for the *transmission-daemon* torrent client, showing all required commands:
|
|
|
|
```txt
|
|
$ tm torrent-files r_2412000.torrent
|
|
2412000/00b3c21460499dbd80bb3a118974c879
|
|
2412000/00b64be1207c374e8719ee1186a33c4d
|
|
2412000/00c4f3a075d3af0813479754f010c491
|
|
...
|
|
... (994 files omitted for brevity)
|
|
...
|
|
2412000/ff2473a3b8ec1439cc459711fb2a4b97
|
|
2412000/ff913204c002f19ed2ee1e2bdfd236d4
|
|
2412000/ffb249ae5d148639d38f2af2dba6c681
|
|
|
|
$ tm torrent-hash r_2412000.torrent
|
|
e73d4bc21d0f91088c174834840f7da232330b4d
|
|
|
|
$ tm add-selective r_2412000.torrent 00c4f3a075d3af0813479754f010c491
|
|
... (torrent client output omitted)
|
|
|
|
$ tm ls 6934f632c06a91572b4401e5b4c96eec89d311d7
|
|
ID Done Have ETA Up Down Ratio Status Name
|
|
25 0% None Unknown 0.0 0.0 None Idle 762000
|
|
Sum: None 0.0 0.0
|
|
|
|
(output from transmission-daemon, format is client-dependent)
|
|
|
|
$ tm info 6934f632c06a91572b4401e5b4c96eec89d311d7
|
|
... (torrent client output omitted)
|
|
|
|
$ tm active 6934f632c06a91572b4401e5b4c96eec89d311d7; echo "torrent is $([[ $? -gt 0 ]] && echo "not ")active"
|
|
torrent is active
|
|
|
|
$ if tm active 6934f632c06a91572b4401e5b4c96eec89d311d7; then echo "torrent is active"; fi
|
|
torrent is active
|
|
|
|
$ tm active d34db33f; echo "torrent is $([[ $? -gt 0 ]] && echo "not ")active"
|
|
torrent is not active
|
|
```
|
|
|
|
#### The `tm` torrent helper script
|
|
The `tm` torrent helper script supports the following options:
|
|
```txt
|
|
$ tm -h
|
|
tm version 0.1
|
|
|
|
Use: tm COMMAND OPTIONS [parameters]
|
|
tm-COMMAND OPTIONS [parameters]
|
|
|
|
A helper script for transmission-remote and related tools, adding some
|
|
functionality like selective download etc.
|
|
|
|
PROGRAMS/COMMANDS
|
|
|
|
tm-active active
|
|
tm-add add
|
|
tm-add-selective add-selective
|
|
tm-cmd cmd
|
|
tm-file-count file-count
|
|
tm-files files
|
|
tm-help help
|
|
tm-info info
|
|
tm-ls ls
|
|
tm-remove remove
|
|
tm-start start
|
|
tm-stop stop
|
|
tm-torrent-files torrent-files
|
|
tm-torrent-hash torrent-hash
|
|
tm-torrent-show torrent-show
|
|
|
|
OPTIONS
|
|
|
|
-k create symbolic links
|
|
creates links to all supported commands
|
|
e.g. tm-cmd, tm-ls, tm-add, ...
|
|
links are created in the directory where tm resides
|
|
|
|
-n NETRC set netrc (/home/frank/.tm-netrc)
|
|
|
|
-H HOST set host (p2p:4081)
|
|
|
|
-c create a config file using current settings (see -n, -H)
|
|
|
|
-l execute command 'ls'
|
|
|
|
-a TORR execute command 'add'
|
|
|
|
-h this help message
|
|
|
|
EXAMPLES
|
|
|
|
In all cases it is possible to replace tm-COMMAND with tm COMMAND
|
|
|
|
show info about running torrents:
|
|
|
|
$ tm-ls
|
|
|
|
add a torrent or a magnet link:
|
|
|
|
tm-add /path/to/torrent/file.torrent
|
|
tm-add 'magnet:?xt=urn:btih:123...'
|
|
|
|
add a torrent and selectivly download two files
|
|
this only works with torrent files (i.e. not magnet links) for now
|
|
|
|
tm-add-selective /path/to/torrent/file.torrent filename1,filename2
|
|
|
|
show information about a running torrent, using its btih or ID:
|
|
|
|
tm-show f0a7524fe95910da462a0d1b11919ffb7e57d34a
|
|
tm-show 21
|
|
|
|
show files for a running torrent identified by btih (can also use ID)
|
|
|
|
tm-files f0a7524fe95910da462a0d1b11919ffb7e57d34a
|
|
|
|
stop a running torrent, using its ID (can also use btih)
|
|
|
|
tm-stop 21
|
|
|
|
get btih for a torrent file
|
|
|
|
tm-torrent-hash /path/to/torrent/file.torrent
|
|
|
|
remove a torrent from transmission
|
|
|
|
tm-remove 21
|
|
|
|
execute any transmission-remote command - notice the double dash
|
|
see man transmission-remote for more info on supported commands
|
|
|
|
|
|
tm-cmd -- -h
|
|
tm cmd -h
|
|
|
|
|
|
CONFIGURATION FILES
|
|
|
|
/home/username/.config/tm.conf
|
|
|
|
tm can be configured by editing the script itself or the configuration file:
|
|
|
|
netrc=~/.tm-netrc
|
|
tm_host="transmission-host.example.org:4081"
|
|
|
|
values set in the configuration file override those in the script
|
|
```
|
|
|
|
|
|
## Classify
|
|
Classify is a tool which, when fed an *identifier* (ISBN or ISSN, it also works
|
|
with UPC and OCLC OWI/WI but these are not in the database) [i]or[/i] a
|
|
database name and MD5 can be used to extract classification data from the OCLC
|
|
classifier. Depending on what OCLC returns it can be used to add or update the
|
|
following fields:
|
|
|
|
### Always present:
|
|
- Author
|
|
- Title
|
|
|
|
### One or more of:
|
|
- [DDC](https://en.wikipedia.org/wiki/Dewey_Decimal_Classification)
|
|
- [LCC](https://en.wikipedia.org/wiki/Library_of_Congress_Classification)
|
|
- [NLM](https://en.wikipedia.org/wiki/National_Library_of_Medicine_classification)
|
|
- [FAST](https://www.oclc.org/research/areas/data-science/fast.html) (Faceted Application of Subject Terminology, basically a list of subject keywords derived from the Library of Congress Subject Headings (LCSH))
|
|
|
|
The *classify* tool stores these fields in CSV files which can be fed to the
|
|
*import_metadata* tool (see below)to update the database and/or produce SQL
|
|
code. It can also store all XML data as returned by the OCLC classifier for
|
|
later use, this offloads the OCLC classifier service which is marked as
|
|
'experimental' and 'not built for production use' and as such can change or
|
|
disappear at any moment.
|
|
|
|
The *classify* helper script supports the following options:
|
|
|
|
```
|
|
$ classify -h
|
|
classify "version 0.5.0"
|
|
|
|
Use: classify [OPTIONS] identifier[,identifier...]
|
|
|
|
Queries OCLC classification service for available data
|
|
Supports: DDC, LCC, NLM, FAST, Author and Title
|
|
|
|
Valid identifiers are ISBN, ISSN, UPC and OCLC/OWI
|
|
|
|
OPTIONS:
|
|
|
|
-d show DDC
|
|
-l show LCC
|
|
-n show NLM
|
|
-f show FAST
|
|
-a show Author
|
|
-t show Title
|
|
|
|
-o show OWI (OCLC works identifier)
|
|
-w show WI (OCLC works number)
|
|
|
|
-C md5 create CSV (MD5,DDC,LCC,NLM,FAST,AUTHOR,TITLE)
|
|
use -D libgen/-D libgen_fiction to indicate database
|
|
|
|
-X dir save OCLC XML response to $dir/$md5.xml
|
|
only works with a defined MD5 (-C MD5)
|
|
|
|
-D db define which database to use (libgen/libgen_fiction)
|
|
|
|
-A show all available data for identifier
|
|
|
|
-V show labels
|
|
|
|
-@ PORT use torsocks to connect to the OCLC classify service.
|
|
use this to avoid getting your IP blocked by OCLC
|
|
|
|
-h show this help message
|
|
|
|
Examples
|
|
|
|
$ classify -A 0199535760
|
|
AUTHOR: Plato | Jowett, Benjamin, 1817-1893 Translator; Editor; Other] ...
|
|
TITLE: The republic
|
|
DDC: 321.07
|
|
LCC: JC71
|
|
|
|
$ classify -D libgen -C 25b8ce971343e85dbdc3fa375804b538
|
|
25b8ce971343e85dbdc3fa375804b538,"321.07","JC71","",UG9saXRpY2FsI\
|
|
HNjaWVuY2UsVXRvcGlhcyxKdXN0aWNlLEV0aGljcyxQb2xpdGljYWwgZXRoaWNzLFB\
|
|
oaWxvc29waHksRW5nbGlzaCBsYW5ndWFnZSxUaGVzYXVyaQo=,UGxhdG8gfCBKb3dl\
|
|
dHQsIEJlbmphbWluLCAxODE3LTE4OTMgW1RyYW5zbGF0b3I7IEVkaXRvcjsgT3RoZX\
|
|
JdIHwgV2F0ZXJmaWVsZCwgUm9iaW4sIDE5NTItIFtUcmFuc2xhdG9yOyBXcml0ZXIg\
|
|
b2YgYWRkZWQgdGV4dDsgRWRpdG9yOyBPdGhlcl0gfCBMZWUsIEguIEQuIFAuIDE5MD\
|
|
gtMTk5MyBbVHJhbnNsYXRvcjsgRWRpdG9yOyBBdXRob3Igb2YgaW50cm9kdWN0aW9u\
|
|
XSB8IFNob3JleSwgUGF1bCwgMTg1Ny0xOTM0IFtUcmFuc2xhdG9yOyBBdXRob3I7IE\
|
|
90aGVyXSB8IFJlZXZlLCBDLiBELiBDLiwgMTk0OC0gW1RyYW5zbGF0b3I7IEVkaXRv\
|
|
cjsgT3RoZXJdCg==,VGhlIHJlcHVibGljCg==
|
|
|
|
|
|
Classifying libgen/libgen_fiction
|
|
|
|
This tool can be used to add classification data to libgen and
|
|
libgen_fiction databases. It does not directy modify the database,
|
|
instead producing CSV which can be used to apply the modifications.
|
|
The best way to do this is to produce a list of md5 hashes for
|
|
publications which do have Identifier values but lack values for DDC
|
|
and/or LCC. Such lists can be produced by the following SQL:
|
|
|
|
libgen: select md5 from updated where IdentifierWODash<>"" and DDC="";
|
|
libgen_fiction: select md5 from fiction where Identifier<>"" and DDC="";
|
|
|
|
Run these as batch jobs (mysql -B .... -e 'sql_code_here;' > md5_list), split
|
|
the resulting file in ~1000 line sections and feed these to this tool,
|
|
preferably with a random pause between requests to keep OCLC's intrusion
|
|
detection systems from triggering too early. It is advisable to use
|
|
this tool through Tor (using -@ TORPORT to enable torsocks, make sure it
|
|
is configured correctly for your Tor instance) to avoid having too
|
|
many requests from your IP to be registered, this again to avoid
|
|
your IP being blocked. The OCLC classification service is not
|
|
run as a production service (I asked them).
|
|
|
|
Return values are stored in the following order:
|
|
|
|
MD5,DDC,LCC,NLM,FAST,AUTHOR,TITLE
|
|
|
|
DDC, LCC and NLM are enclosed within double quotes and can contain
|
|
multiple space-separated values. FAST, AUTHOR and TITLE are base64 encoded
|
|
since these fields can contain a whole host of unwholesome characters
|
|
which can mess up CSV. The AUTHOR field currentlydecodes to a pipe ('|')
|
|
separated list of authors in the format:
|
|
|
|
LAST_NAME, NAME_OR_INITIALS, DATE_OF_BIRTH-[DATE_OF_DEATH] [[ROLE[[;ROLE]...]]]
|
|
|
|
This format could change depending on what OCLC does with the
|
|
(experimental) service.
|
|
```
|
|
|
|
## import_metadata
|
|
Taking a file containing lines of CSV-formatted data, this tool can be used to
|
|
update a libgen / libgen_fiction database with fresh metadata. It can also be
|
|
used to produce SQL (using the -s sqlfile option) which can be used to update
|
|
multiple database instances.
|
|
|
|
In contrast to the other *books* tools *import_metadata* is a Python (version
|
|
3) script using the *pymysql* "pure python" driver (*python3-pymysql* on Debian)
|
|
and as such should run on any device where Python is available. The
|
|
distribution file contains a Bash script (*import_metadata.sh*) with the same
|
|
interface and options which can be used where Python is not available.
|
|
|
|
|
|
```
|
|
$ import_metadata -h
|
|
|
|
import_metadata v.0.1.0
|
|
|
|
Use: import_metadata [OPTIONS] -d database -f "field1,field2" -F CSVDATAFILE
|
|
|
|
Taking a file containing lines of CSV-formatted data, this tool can be
|
|
used to update a libgen / libgen_fiction database with fresh metadata.
|
|
It can also be used to produce SQL (using the -s sqlfile option) which
|
|
can be used to update multiple database instances.
|
|
|
|
CSV data format:
|
|
|
|
MD5,DDC,LCC,NLM,FAST,AUTHOR,TITLE
|
|
|
|
Fields FAST, AUTHOR and TITLE should be base64-encoded.
|
|
|
|
CSV field names are subject to redirection to database field names,
|
|
currently these redirections are active (CSV -> DB):
|
|
|
|
['FAST -> TAGS']
|
|
|
|
OPTIONS:
|
|
|
|
-d DB define which database to use (libgen/libgen_fiction)
|
|
|
|
-f field1,field2
|
|
-f field1 -f field2
|
|
define which fields to update
|
|
|
|
-F CSVFILE
|
|
define CSV input file
|
|
|
|
-s SQLFILE
|
|
write SQL to SQLFILE
|
|
|
|
-n do not update database
|
|
use with -s SQLFILE to produce SQL for later use
|
|
use with -v to see data from CSVFILE
|
|
use with -vv to see SQL
|
|
|
|
-v verbosity
|
|
repeat to increase verbosity
|
|
|
|
-h this help message
|
|
|
|
Examples
|
|
|
|
$ import_metadata -d libgen -F csv/update-0000 -f 'ddc lcc fast'
|
|
|
|
update database 'libgen' using data from CSV file csv/update-0000,
|
|
fields DDC, LCC and FAST (which is redirected to libgen.Tags)
|
|
|
|
$ for f in csv/update-*;do
|
|
import_metadata -d libgen -s "$f.sql" -n -f 'ddc,lcc,fast' -F "$f"
|
|
done
|
|
|
|
create SQL (-s "$f.sql") to update database using fields
|
|
DDC, LCC and FAST from all files matching glob csv/update-*,
|
|
do not update database (-n option)
|
|
```
|
|
|
|
|
|
|
|
## Installation
|
|
Download this repository (or a tarball) and copy the four scripts - `books`, `update_libgen`, `refresh_libgen` and `tm` (only needed when using the transmission-daemon torrent client) - into a directory which is somewhere on your $PATH ($HOME/bin would be a good spot). Run `books -k`to create symlinks to the various names under which the program can be run:
|
|
|
|
* `books`
|
|
* `books-all`
|
|
* `fiction`
|
|
* `nbook`
|
|
* `xbook`
|
|
* `nfiction`
|
|
* `xfiction`
|
|
|
|
Create a database on a mysql server somewhere within reach of the intended host. Either open *books* in an editor to configure the database details (look for `CONFIGURE ME` below) and anything else (eg. `target_directory` for downloaded books, `max_age` before update, `language` for topics, MD5 in filenames, tools, etc) or add these settings to the (optional) config file `books.conf` in $XDG_CONFIG_HOME (usually $HOME/.config). The easiest way to create the config file is to run `refresh_libgen` with the required options. As an example, the following command sets the database server to `base.example.org`, the database port to `3306` and the database username to `genesis`:
|
|
|
|
```bash
|
|
$ refresh_libgen -H base.example.org -P 3306 -U genesis -c
|
|
```
|
|
|
|
Make sure to add the `-c` option *at the end* of the command or it won't work. Once the config file has been created it can be edited
|
|
|
|
|
|
```bash
|
|
main () {
|
|
# PREFERENCES
|
|
config=${XDG_CONFIG_HOME:-$HOME/.config}/books.conf
|
|
|
|
# target directory for downloaded publications
|
|
target_directory="${HOME}/Books" <<<<<< ... CONFIGURE ME ... >>>>>>
|
|
# when defined, subdirectory of $target_directory) for torrents
|
|
torrent_directory="torrents"
|
|
# when defined, location where files downloaded with torrent client end up
|
|
# torrent_download_directory="/net/p2p/incoming" <<<<<< ... ENABLE/CONFIGURE ME ... >>>>>>
|
|
# when true, launch cron jobs to copy files from torrent download directory
|
|
# to target directory using the correct name
|
|
torrent_cron_job=1
|
|
# default limit on queries
|
|
limit=1000
|
|
# maximum database age (in minutes) before attempting update
|
|
max_age=120
|
|
# topics are searched/displayed in this language ("en" or "ru")
|
|
language="en" <<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
# database host
|
|
dbhost="localhost" <<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
# database port
|
|
dbport="3306" <<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
# database user
|
|
dbuser="libgen" <<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
# default fields for fulltext search
|
|
default_fields="author,title"
|
|
# window/dialog heading for dialog and yad/zenity
|
|
list_heading="Select publication(s) for download:"
|
|
|
|
# add md5 to filename? Possibly superfluous as it can be derived from the file contents but a good guard against file corruption
|
|
filename_add_md5=0
|
|
|
|
# tool preferences, list preferred tool first
|
|
gui_tools="yad|zenity"
|
|
tui_tools="dialog|whiptail"
|
|
dl_tools="curl|wget"
|
|
parser_tools="xidel|hxwls"
|
|
pager_tools="less|more"
|
|
|
|
# torrent helper tools need to support the following commands:
|
|
# ttool add-selective <torrent_file> <md5> # downloads file <md5> from torrent <torrent_file>
|
|
# ttool torrent-hash <torrent_file> # gets btih for <torrent_file>
|
|
# ttool torrent-files <torrent_file> # lists files in <torrent_file>
|
|
torrent_tools="tm" <<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
|
|
# database names to use:
|
|
# books, books-all, nbook, xbook and xbook-all use the main libgen database
|
|
# fiction, nfiction and xfiction use the 'fiction' database
|
|
declare -A programs=(
|
|
[books]=libgen <<<<<<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
[books-all]=libgen <<<<<<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
[nbook]=libgen <<<<<<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
[xbook]=libgen <<<<<<<<<<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
[fiction]=libgen_fiction <<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
[nfiction]=libgen_fiction <<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
[xfiction]=libgen_fiction <<<<<<<< ... CONFIGURE ME ..... >>>>>>>>>>>>>>>>
|
|
[libgen_preview]=libgen # the actual database to use for preview is passed as a command line option
|
|
)
|
|
```
|
|
|
|
The same goes for the 'PREFERENCES' sections in `update_libgen` and `refresh_libgen`. In most cases the only parameters which might need change are `dbhost`, `dbuser`, `ipfs_gw` (if you don't want to use the default hosted by Cloudfront), `torrent_download_directory` and possibly `torrent_tools`. Since all programs use a common `books.conf` config file it is usually sufficient to add these parameters there:
|
|
|
|
```bash
|
|
$ cat $HOME/.config/books.conf
|
|
dbhost="base.example.org"
|
|
dbuser="exampleuser"
|
|
ipfs_gw="http://ipfs.example.org"
|
|
torrent_download_directory="/net/p2p/incoming"
|
|
torrent_tools="tm"
|
|
```
|
|
|
|
Please note that there is no option to enter a database password as that would be rather insecure. Either use a read-only, password-free mysql user to access the database or enter your database details in $HOME/.my.cnf, like so:
|
|
|
|
```ini
|
|
[mysql]
|
|
user=exampleuser
|
|
password=zooperzeekret
|
|
```
|
|
|
|
Make sure the permissions on $HOME/.my.cnf are sane (eg. mode 640 or 600), see http://dev.mysql.com/doc/refman/5.7/en/ ... files.html for more info on this subject.
|
|
|
|
Install symlinks to all tools by calling books with the -k option:
|
|
|
|
```
|
|
$ books -k
|
|
```
|
|
|
|
## configuration file
|
|
The configuration file is *source*d by all shell scripts, it is parsed and interpreted by import_metadata. There are some of the more useful parameters which can be set in this file:
|
|
|
|
```
|
|
dbhost="base.example.org"
|
|
dbport="3306"
|
|
dbuser="libgen"
|
|
```
|
|
Use these to set the database server hostname, port and username.
|
|
```
|
|
torrent_download_directory="/net/incoming"
|
|
torrent_cron_job=1
|
|
torrent_tools="tm"
|
|
```
|
|
Set torrent download directory (where the torrent client places downloaded files), whether a cron job should be created to copy downloaded publications to their final name and location, which torrent helper tool to use
|
|
```
|
|
use_deep_path=1
|
|
```
|
|
Add section, language, author and subject to path name (e.g. nonfication/German/Physics/Einstein, Albert./Die Evolution der Physik)
|
|
```
|
|
use_ipfs=1
|
|
```
|
|
Try to use IPFS when downloading, reverts to direct download for files which do not have a defined ipfs_cid
|
|
```
|
|
gui_tools="yad|zenity"
|
|
tui_tools="dialog|whiptail"
|
|
parser_tools="xidel|hxwls"
|
|
dl_tools="wget|curl"
|
|
pager_tools="less|more|cat"
|
|
```
|
|
Tools to be used, in|order|of|preference - the first available is used
|
|
```
|
|
api=http://libgen.rs/json.php
|
|
base=http://libgen.rs/dbdumps/
|
|
ipfs_gw=https://cloudflare-ipfs.com
|
|
#ipfs_gw=http://your_own_ipfs_node.example.org:8080
|
|
```
|
|
Defines which resources to use for API, dumps, IPFS etc
|
|
```
|
|
classify_xml="/home/username/Project/libgen_classify/xml"
|
|
classify_csv="/home/username/Project/libgen_classify/csv"
|
|
classify_sql="/home/username/Project/libgen_classify/sql"
|
|
classify_fields="ddc,lcc,nlm,fast,title,author"
|
|
classify_tor_ports="9100,9102,9104,9106,9108"
|
|
```
|
|
Used by update_libgen to configure *classify* and *import_metadata*, defines whether files are saved and where they are saved, which fields to update in the database and whether Tor is used and if so on which port(s). It is advisable to use more than one port to spread the traffic over several exit nodes, this reduces the risk of OCLC blocking the Tor exit node.
|
|
|
|
There are far more configurable parameters, check the script source for more possibilities.
|
|
|
|
## *update_libgen* vs. *refresh_libgen*
|
|
|
|
If you regularly use books, nbook and/or xbook, the main (or compact) database should be kept up to date automatically. In that case it is only necessary to use *refresh_libgen* to refresh the database when you get a warning from *update_libgen* about unknown columns in the API response.
|
|
|
|
If you have not used any of these tools for a while it can take a long time - and a lot of data transfer - to update the database through the API (which is what *update_libgen* does). Especially when using the compact database it can be quicker to use *refresh_libgen* to just pull the latest dump instead of waiting for *update_libgen* to do its job.
|
|
|
|
The *fiction* database can not be updated through the API (yet), so for that databases *refresh_libgen* is currently the canonical way to get the latest version.
|
|
|
|
## Dependencies
|
|
|
|
These tools have the following dependencies (apart from a locally available libgen/libgen_fiction instance on MySQL/MariaDB), sorted in order of preference:
|
|
|
|
* all: bash 4.x or higher - the script relies on quite a number of bashisms
|
|
|
|
|
|
* `books`/`fiction`: less | more (use less!)
|
|
* `nbook`/`nfiction`: dialog | whiptail (whiptail is buggy, use dialog!)
|
|
* `xbook`/`xfiction`: yad | zenity (more functionality with yad, but make sure your yad supports --html - you might have to build it yourself (use --enable-html during ./configure). If in doubt about the how and why of this, just use Zenity)
|
|
|
|
|
|
Preview/Download has these dependencies:
|
|
|
|
* awk (tested with mawk, nawk and gawk)
|
|
* stdbuf (part of GNU coreutils)
|
|
* xidel | hxwls (html parser tools, used for link extraction)
|
|
* curl | wget
|
|
|
|
|
|
`update_libgen` has the following dependencies:
|
|
|
|
* jq (CLI json parser/mangler)
|
|
* awk (tested with mawk, nawk and gawk)
|
|
|
|
|
|
`refresh_libgen` has these dependencies:
|
|
|
|
* w3m
|
|
* wget
|
|
* unrar
|
|
* pv (only needed when using the verbose (-v) option
|
|
|
|
`tm` has these dependencies:
|
|
|
|
* transmission-remote
|
|
|
|
|