summaryrefslogtreecommitdiffstats
path: root/README.md
blob: 8956403725dd762d40be54a25c8774772ae42bb3 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# sear.c

sear.c is used as a lightweight replacement for [SearX](http://en.wikipedia.org/wiki/Searx) that proxies and caches search results
from the Google web search engine. The main advantages over SearX are speed and simplicity.

## packaging

### debian and ubuntu

First add my software distribution repository [prog.sijanec.eu](http://prog.sijanec.eu) into your APT sources list. See instructions [there](http://prog.sijanec.eu).

```
apt install sear.c
systemctl enable sear.c
service sear.c start
```

### gentoo

First add my ebuild overlay repository [sijanec/ebuild](http://git.sijanec.eu/sijanec/ebuild) into your portage repos.conf. See instructions [there](http://git.sijanec.eu/sijanec/ebuild). [Read this note.](#user-content-notes)

```
emerge --ask www-apps/searc
rc-update add sear.c
rc-service start sear.c
```

## requirements

* a POSIX system
* GNU C library (uses `tdestroy(3)` if compiled without `SC_OLD_STORAGE`). `musl` supports `tdestroy(3)`, though `CC=musl-gcc` does not work.
* GNU compiler collection (it's written in GNU C - it uses nested functions).
* GNU Make. (needs to support `.NOTPARALLEL:`).
* libxml2-dev (for the simple HTML/1.0 client and HTML parser).
* libmicrohttpd-dev (for serving results - use a reverse proxy, such as nginx, for HTTPS).
* xxd (for converting HTML pages into C arrays when compiling from source).

### supported browsers

pages that sear.c generates were tested and are usable on the following www clients: <a href=http://github.com/Eloston/ungoogled-chromium>ungoogled-chromium</a>, <a href=//gnu.org/software/gnuzilla>icecat</a>, <a href=//links.twibright.com>links</a> and many more

## compiling from source

```
make prepare	# debian only, runs apt install (run as root)
make		# compiles
./sear.c	# runs the server
```

## instructions

* run the daemon - it starts listening on HTTP port 7327 (remember it by picturing phone keyboard buttons with letters SEAR (; )
* optional: create a reverse proxy for HTTPS
* navigate to [http://localhost:7327](http://localhost:7327) and do a couple of searches to see if everything works
* the horseshoe button redirects directly to the first result without wasting time on the results page. use if you feel lucky. (BP) 
* the painting button performs a search for images. PRIVACY WARNING: images are loaded directly from servers (not from google)
* program writes all logs to standard error
* setting the h parameter will rewrite links to HTTP from HTTPS
* setting the l parameter with a number will limit number of displayed links to that number.
* upstream engines sometimes respond with a CAPTCHA after repediated requests. set the environment variable `SC_FALLBACK` to a URL prefix (`http://fallback.example:7327/search?`) to HTTP redirect clients in case of such upstream errors.
* shipped systemd unit and openrc init file loads environment variables from `/etc/sear.c` if it exists as `VAR=VAL`.

## configuration

configuration is done with environment variables and with build time definitions:

* environment variable `SC_PORT` containing a number defines the port, 7327 by default
* preprocessor definition `SC_LOGMEM` when set, causes the program to store all logs to memory and display them via HTTP HTML UI on /logs.html
* environment variable `SC_FALLBACK` defines a URL prefix of a search engine (possibly another sear.c instance) to which clients will be HTTP redirected when upstream engine responds with a captcha. Example: `http://fallback.example:7327/search?some=param&other=param`. HTTP query parameters are appended.
* environment variable `SC_LOGLEVEL` overrides the build time preprocessor definition `SC_LOGLEVEL`, which is by default `"SC_LOG_ERROR SC_LOG_WARNING SC_LOG_INFO SC_LOG_DEBUG"` (all log levels) and, as the name applies, sets the loglevel to both /logs.html (if enabled) and stderr logging.
* preprocessor definition `SC_OLD_STORAGE` defines whether old query storage mechanism O(n) should be used instead of the new `tsearch(3)` O(log n). This option is deprecated, but I'll leave it in for some time just in case some errors show up with the new implementation (perhaps scary security issues).

when openrc init script or systemd unit file is used, environment variables in newline separated format `NAME=VALUE` are read from `/etc/sear.c`, should that file exist.

## prebuilt binaries

apart from the usual debian distribution, there are also prebuilt dynamically linked binaries built for amd64, arm64, i386 and armel, as well as debian packages.

before downloading, check that the build passed, indicated below on the badge:

[![Build Status](https://jenkins.sijanec.eu/job/sear.c/badge/icon)](http://jenkins.sijanec.eu/job/sear.c/)

* amd64: <https://amd64.sijanec.eu/prog/sear.c>
* arm64: <https://arm64.sijanec.eu/prog/sear.c>
* armel: <https://armel.sijanec.eu/prog/sear.c>
* i386: *only published in debian package repository because they are built on my personal laptop*

## screenshots

![screenshot in chromium 0](https://cdn.sijanec.eu/img/2021/04/sear.c_prtsc.png)
![screenshot in chromium 2](https://cdn.sijanec.eu/img/2021/04/sear.c_prtsc2.png)
![screenshot in chromium 3](https://cdn.sijanec.eu/img/2021/04/sear.c_prtsc3.png)
![screenshot in chromium 4](https://cdn.sijanec.eu/img/2021/04/sear.c_prtsc4.png)
![screenshot in chromium 5](https://cdn.sijanec.eu/img/2021/04/sear.c_prtsc5.png)

## security

* please email me if you find any (security) issues in the program.
* always run sear.c as an unprivileged user in a chroot (gentoo and debian distribution services do that)

## additional information

* valgrind reports a memory leak, leak is bigger with every API search query. run `make valgrind` and you'll see it. I was unable to find the bug, but it just bothers me. I wrote a small bug PoC (test/bug) but I could not replicate the bug (`cd tmp/bug; make; make valgrind; less valgrind-out.txt` - process exits with no leaks possible). Example output from sear.c valgrind with one request done is included in test/bug/example-valgrind.txt. Such small memory leak is not a problem, since we store all extracted data from the query indefinetley anyways, but it's still pretty dumb to leak memory.
* memory allocations are not checked for failures. This needs to be done to use fanalyzer
* __attribute__s such as nonnull are not set in struct members of query types and in functions such as htmlspecialchars but `if (!arg) return NULL` is done instead, which is poor coding style and fanalyzing can't be done in this case. This needs to be fixed to use fanalyzer.

### notes

* **gentoo ebuild**: openrc's start-stop-daemon lacks support for easy creation of unprivileged daemons in chrooted environments with logging enabled, which sear.c absolutely requires due to it being in early alpha unstable stage. [a pull request was submitted to openrc that adds such features](//http://github.com/OpenRC/openrc/pull/517); until it's merged and until it's changes are gentoo, sear.c's init script is unusable.