Instability investigation #1

Open
opened 2024-04-24 14:16:37 +00:00 by oliverpool · 6 comments

@theoryshaw asked me to look into the poor performance of this instance.

Looking at the admin dashboard, I couldn't spot anything abnormal (the memory Garbage Collection might be a bit too high, but nothing conclusive).

Logging in with ssh, it appears that the vm was "swaping" (storing parts of the memory on disk instead of RAM, which usually results in poor performance). After running sudo swapoff -a && sudo swapon -a, the swap was cleared:

> free -m
               total        used        free      shared  buff/cache   available
Mem:            1845         487         310           2        1047        1197
Swap:           4095           0        4095

To deter linux from using the swap too soon, I also reduced the vm.swapiness to 10 (from its default value of 60), by editing /etc/sysctl.conf and reloading the settings with sudo sysctl --system.

The current value can be checked with sysctl vm.swappiness (or cat /proc/sys/vm/swappiness), which now both output 10.

@theoryshaw asked me to look into the poor performance of this instance. Looking at the admin dashboard, I couldn't spot anything abnormal (the memory Garbage Collection might be a bit too high, but nothing conclusive). Logging in with ssh, it appears that the vm was "swaping" (storing parts of the memory on disk instead of RAM, which usually results in poor performance). After running `sudo swapoff -a && sudo swapon -a`, the swap was cleared: ``` > free -m total used free shared buff/cache available Mem: 1845 487 310 2 1047 1197 Swap: 4095 0 4095 ``` To deter linux from using the swap too soon, I also [reduced the vm.swapiness](https://askubuntu.com/a/149427) to `10` (from its default value of `60`), by editing `/etc/sysctl.conf` and reloading the settings with `sudo sysctl --system`. The current value can be checked with `sysctl vm.swappiness` (or `cat /proc/sys/vm/swappiness`), which now both output `10`.
Author

A couple of minutes later, the swap is being used again:

               total        used        free      shared  buff/cache   available
Mem:            1845         480         297           1        1067        1207
Swap:           4095          19        4076

Trying to find the process responsible for swap usage:

grep '^VmSwap:' /proc/*/status \
    | grep -v '0 kB$' \
    | sed -re 's#^/proc/([0-9]+)/status:VmSwap:[ \t]+([0-9]+) kB$#\1 \2#' \
    | sort -nrk2 \
    | while read pid swap; do \
        printf "%10s kB    %-6s    " ${swap} ${pid}; \
        cat /proc/${pid}/cmdline | xargs -0 | fold -sw160 | sed -re '/^$/d; 1!s/^/\t\t\t\t/;'; \
      done
     54784 kB    638       /usr/sbin/mariadbd
      4736 kB    426       /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
      3072 kB    373       /lib/systemd/systemd-resolved
      1896 kB    123137    (sd-pam)
       896 kB    436       /usr/libexec/udisks2/udisksd

So mariadb seems to go easily to swap.

The theorical RAM usage of mariadb does not seem too high:

SELECT ROUND(
    ( @@GLOBAL.key_buffer_size                     
     + @@GLOBAL.query_cache_size 
     + @@GLOBAL.tmp_table_size 
     + @@GLOBAL.innodb_buffer_pool_size 
     + @@GLOBAL.innodb_log_buffer_size 
     + @@GLOBAL.max_connections * ( 
         @@GLOBAL.sort_buffer_size
       + @@GLOBAL.read_buffer_size 
       + @@GLOBAL.read_rnd_buffer_size 
       + @@GLOBAL.join_buffer_size 
       + @@GLOBAL.thread_stack 
       + @@GLOBAL.binlog_cache_size)
    ) / 1024 / 1024, 1) `total MB`;
+----------+
| total MB |
+----------+
|    733.2 |
+----------+
1 row in set (0.006 sec)
A couple of minutes later, the swap is being used again: ``` total used free shared buff/cache available Mem: 1845 480 297 1 1067 1207 Swap: 4095 19 4076 ``` Trying to [find the process responsible for swap usage](https://superuser.com/a/1443379): ``` grep '^VmSwap:' /proc/*/status \ | grep -v '0 kB$' \ | sed -re 's#^/proc/([0-9]+)/status:VmSwap:[ \t]+([0-9]+) kB$#\1 \2#' \ | sort -nrk2 \ | while read pid swap; do \ printf "%10s kB %-6s " ${swap} ${pid}; \ cat /proc/${pid}/cmdline | xargs -0 | fold -sw160 | sed -re '/^$/d; 1!s/^/\t\t\t\t/;'; \ done ``` ``` 54784 kB 638 /usr/sbin/mariadbd 4736 kB 426 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 3072 kB 373 /lib/systemd/systemd-resolved 1896 kB 123137 (sd-pam) 896 kB 436 /usr/libexec/udisks2/udisksd ``` So mariadb seems to go easily to swap. The [theorical RAM usage](https://serverfault.com/a/1020847) of mariadb does not seem too high: ```sql SELECT ROUND( ( @@GLOBAL.key_buffer_size + @@GLOBAL.query_cache_size + @@GLOBAL.tmp_table_size + @@GLOBAL.innodb_buffer_pool_size + @@GLOBAL.innodb_log_buffer_size + @@GLOBAL.max_connections * ( @@GLOBAL.sort_buffer_size + @@GLOBAL.read_buffer_size + @@GLOBAL.read_rnd_buffer_size + @@GLOBAL.join_buffer_size + @@GLOBAL.thread_stack + @@GLOBAL.binlog_cache_size) ) / 1024 / 1024, 1) `total MB`; ``` ``` +----------+ | total MB | +----------+ | 733.2 | +----------+ 1 row in set (0.006 sec) ```
Author

A bit later, the swap is being used even more, but mainly by the gitea process:

     30124 kB    29013     /usr/local/bin/gitea web --config /etc/gitea/app.ini
      7424 kB    426       /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
      3968 kB    373       /lib/systemd/systemd-resolved
      2804 kB    432       /usr/lib/snapd/snapd
      2408 kB    123137    (sd-pam)
A bit later, the swap is being used even more, but mainly by the `gitea` process: ``` 30124 kB 29013 /usr/local/bin/gitea web --config /etc/gitea/app.ini 7424 kB 426 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 3968 kB 373 /lib/systemd/systemd-resolved 2804 kB 432 /usr/lib/snapd/snapd 2408 kB 123137 (sd-pam) ```
Author

Use the GOMEMLIMIT env var to try to reduce RAM consumption of Forgejo. In /etc/systemd/system/gitea.service, adjust the Service Section:

[Service]
Environment=USER=git HOME=/home/git GITEA_WORK_DIR=/var/lib/gitea GOMEMLIMIT=1000MiB
Use the [`GOMEMLIMIT`](https://pkg.go.dev/runtime#hdr-Environment_Variables) env var to try to reduce RAM consumption of Forgejo. In `/etc/systemd/system/gitea.service`, adjust the `Service` Section: ``` [Service] Environment=USER=git HOME=/home/git GITEA_WORK_DIR=/var/lib/gitea GOMEMLIMIT=1000MiB ```
Author

The instance has been upgraded to an instance with 4GB of RAM (instead of 2GB). The DNS were updated accordingly.

The /etc/systemd/system/gitea.service file has been adjusted to GOMEMLIMIT=2000MiB.

> free -m
               total        used        free      shared  buff/cache   available
Mem:            3834         464        2348           1        1021        3207
Swap:           4095           0        4095
The instance has been upgraded to an instance with 4GB of RAM (instead of 2GB). The DNS were updated accordingly. The `/etc/systemd/system/gitea.service` file has been adjusted to `GOMEMLIMIT=2000MiB`. ``` > free -m total used free shared buff/cache available Mem: 3834 464 2348 1 1021 3207 Swap: 4095 0 4095 ```
Author

Swap is still in use, but the instance seems to stay responsive:

> free -m
               total        used        free      shared  buff/cache   available
Mem:            3834         512         436           1        2885        3126
Swap:           4095         436        3659

I suspect the LFS features to be heavy RAM consumers (should be investigated on the Forgejo side).

Swap is still in use, but the instance seems to stay responsive: ``` > free -m total used free shared buff/cache available Mem: 3834 512 436 1 2885 3126 Swap: 4095 436 3659 ``` I suspect the LFS features to be heavy RAM consumers (should be investigated on the Forgejo side).
Author

I took a look at the requests happening on the instance and it seems that a couple of robots are indexing the content.
For instance https://developer.amazon.com/amazonbot is currently browsing https://hub.openingdesign.com/OpeningDesign/FreeMVD_Mirror
Maybe https://forgejo.org/docs/latest/admin/search-engines-indexation/ could be added to kindly ask some of them to go away.

I took a look at the requests happening on the instance and it seems that a couple of robots are indexing the content. For instance https://developer.amazon.com/amazonbot is currently browsing https://hub.openingdesign.com/OpeningDesign/FreeMVD_Mirror Maybe https://forgejo.org/docs/latest/admin/search-engines-indexation/ could be added to kindly ask some of them to go away.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: OpeningDesign/gitea_customization#1
No description provided.