Start Free Trial

asked on

linux kernel mount /proc at boot time before running init

hello

i am trying to make a linux kernel mount /proc at boot time before running rcinit or init

do any of you know of a builtin way using some hidden magic command line parameters ?

i'd rather not write a tiny kernel module just in order to do that but i assume this is feasible and would welcome information if it happens this is required and you can save me some time. on the other hand, if it comes to that, crafting a small c program to mount /proc and run the init command seems simpler. i guess linking /etc/resolv.conf to wherever the kernel stores dhcp acquired settings would still work if /proc is mounted after the lease is acquired but have not checked... yet.

context : working with very minimal ram-based oses and trying to remove the shell part entirely. i already managed to run a linux with exactly 2 executable files : the kernel and a single program. but i am still missing a few functionnalities such as mounting /proc

thanks

EDIT: changed initially misguiding formulation : i want /proc to be mounted AFTER loading initrd and BEFORE running whatever init executable

/proc is not a mountable partition.
I is a reflection of system state.
Initrd is what runs he processes on the system.

I.e. Ypu are trying to build the second floor right after laying the foundation.

Rootfs / has to be present before /Proc is vailable

ASKER

yes. my systems can run fully from initrd rather than a regular root memfs.
this allows the boot loader to load the initrd so the rootfs should be present before at least some of the modules are loaded. ( and anyway initrd is not the last module to be loaded so initrd=whatever should also work )

currently i use a busybox to mount /proc and other tasks before running the init command, and it actually features a simplistic rc system as well. nevertheless that is only optional and my main goal is to remove as much stuff as i can from the filesystem. the rc subsystem will be optional and likely hardly ever needed.

ideally, the minimalist initrc would only contain the software to run, config files, and possibly required libs
currently, it also contains busybox/toybox and a bunch of shell scripts
it also works with an alpine base or debootstrap but that is not my goal

ideas using regular command line arguments ?

What you're trying to do seems like creating an entirely new Distro around BusyBox.

You already seem familiar with Alpine, so you can use Alpine or you can dig into the Alpine boot sequence to roll your own Distro.

Using Alpine will likely be must faster to get working.

Also, keep in mind, if you ever install any other software, like any LAMP Stack software, you instantly loose all perks Alpine provides.

For example, if you compare Alpine to Ubuntu, Alpine will seem smaller.

Then if you install a normal LAMP Stack (Apache + MariaDB + OpenSSL + WordPress/CMS) you'll find there's <1% size difference between the resulting Alpine vs. Ubuntu system.

This means minimizing your Base OS size is only meaningful, if you run simple custom code.

Installing any other code at all, will bloat your entire system to the point of Ubuntu, which is far more versatile to use than Alpine or rolling your own.

what are the resource limits you are dealing with?

ASKER

@david: i do not wish to rebuild alpine or a similar minimal distribution. i already have something that works which i am trying to minimize further. i may piggyback on alpine packages and have been experimenting in this direction, but i have different goals. i want something much more minimal than a complete LFS. and i'd rather keep on topic for now if you don't mind.

@arnold: no specific resource limits. i work on minimal diskless vms that barely load an initrd and run a kernel with a bunch of options to configure the network. typically the minios runs from initrd only and merely contain what a slim chroot would. logging is handled outside of the vm through console logging, possibly over the network ; all the network emulation is in the hypervisor ; time is inherited from the host ( still need to work on that one a little ) ; both static ip and dhcp can now be configured through kernel options ; ... once i accept i need a custom kernel, i do not need modules either and the shell part seems overkill if it is only going to mount /proc

i'm pretty sure a 5k c program can mount /proc and run init, but i would like it better performed by the kernel if possible

i'm also interested in ways to integrate name resolution with the information populated by the in-kernel dhcp. turns out quite a few programs can run without /proc and i'd actually rather not mount it at all when it is not needed. unfortunately i do not know how to read the information besides cat /proc/net/pnp

ASKER

maybe i should explain a bit further : these are currently mostly used as reverse proxies typically running haproxy, perdition, pen, nginx and such software or custom microservices that require no disk access. since the whole system is in ram, i do not use disk caches and i keep the base as small as it gets. a typical complete initrd is smaller than the modules directory alone in an ubuntu installation. i'm pretty sure the whole userland including the ramdisk can run out of less ram than systemd alone. multiple instances can be run from the same kernel and/or initrd. i mostly run them using pxe but i want to keep this part open for now.

Hi Skull....

/proc is mostly mounted during the system boot process, mostly around the time (slightly before external file systems).
In the Initial ramdisk you can mount them BEFORE the pivot_root step is done.
mostly this is done like:
/ is mounted as initrd.....
/new is the root partition to be....

mount /${READ_ROOT} /new 
mount -t proc proc /new/proc
#... some other init
pivot_root /new /old
# close any links to /old
exec /init

umount /old   # forget initrd  in new init environment/.

Open in new window

Maybe some care is need to NOT have files open on the initramfs....
Obviously you need to load any modules etc. into the initrd.

If you don't have pivot_root:

/* vi: set sw=4 ts=4: */
/*
 * pivot_root.c - Change root file system.  Based on util-linux 2.10s
 *
 * busyboxed by Evin Robertson
 * pivot_root syscall stubbed by Erik Andersen, so it will compile
 *     regardless of the kernel being used.
 *
 * Licensed under GPL version 2, see file LICENSE in this tarball for details.
 */
#include "libbb.h"

extern int pivot_root(const char * new_root,const char * put_old);

int pivot_root_main(int argc, char **argv) MAIN_EXTERNALLY_VISIBLE;
int pivot_root_main(int argc, char **argv)
{
        if (argc != 3)
                bb_show_usage();

        if (pivot_root(argv[1], argv[2]) < 0) {
                /* prints "pivot_root: <strerror text>" */
                bb_perror_nomsg_and_die();
        }

        return EXIT_SUCCESS;
}

Open in new window

You may want to delve into gentoo distributions initramfs setup. There it is used to allow to boot from encrypted root filesystems etc.
(Although /proc is mount in the regular boot).

ASKER

thanks a lot for chipping in @noci, hello... that will proove quite useful shortly.

my system runs from a ramdisk ( or several ). my current build process creates an os that runs from initrd. i currently do not plan on using pivot_root in this project since there is no rootfs at all besides the initrd itself.

thanks as well for gentoo's initrd. does quite a few things i spent time rewriting. i also digged into the init system of suze's installer : the c program can parse most of the arguments a kernel may or may not understand such as "ip=dhcp" and the likes.

the current status is the following

- the most minimal os can work without the shell part. i can grab an ip from dhcp and run nginx with about the same files you would stick in a chroot. and it seems /proc is not required for most software i need. but i am missing a working DNS

- i still maintain a more heavyweight variant with minimal shell scripts that mostly mount /proc, load all modules, handle environment vars such as ip=dhcp and then exec whatever program

- i would still like a way to mount /proc directly if possible, but my remaining issue is mostly to get a working dns when dhcp is performed by the kernel and the initrd does nothing since nginx (or whatever main program) is run as init ! i found hackish ways when pxe, though.

i am considering a C script if that is the only way to get there but i'm still hoping for another way. anything that would allow the kernel itself to perform the setup would be great. anything that would allow to grab the dns server and set them up without using /proc might help. currently the best i can do rely on either mounting /proc or parsing dmesg's output to grab the dns server.

i am hoping perhaps some weird nss setting allows to grab the servers from kernel structures without relying on /proc or /sys ... ?

thanks for your help

DNS server --? probably not.
DNS client ... For DNS client you need the resolver libraries. and /etc/resolv setup. not done by the kernel.
Busybox can also mount filesystems (/proc), provide init etc. (i expect you to already use it..). from kernel auto config...
Busybox also has udhcpc on board to query for them.. using a script like: https://git.busybox.net/busybox/tree/examples/udhcp/simple.script
can help also. That can help re-requesting the IP address... (you may need a DHCP client anyway for prolonged use of IP addresses, depending on setup choices).

Then there is this for a real minimal startup:
https://www.kernel.org/doc/html/latest/driver-api/early-userspace/early_userspace_support.html?highlight=ipconfig
The GIT repo: http://git.kernel.org/?p=libs/klibc/klibc.git
The ipconfig tool from there can be the helper you need, it also requires /sys/class/net though.

Oh nss is a C runtime library construct. probably not the place to look, as it uses /etc/hosts & /etc/resolv.conf as source.

Since you refer to what you're doing in the plural, this suggests many instances of your system...

And since there's no persistent disk space involved...

One solution might be just to spin up Docker instance microservices for each instance of your proxy system.

@david, I have a hunch this is about hardware appliances... So docker is just another layer of overhead.

@noci, that's what I thought in the beginning also... then... the author talked about running HAProxy, which requires a massive amount of dependency software to be installed... so...

Likely running Alpine at the machine level, then Docker instances for HAProxy (or whatever else might be run) will probably be lightest weight.

And... All developers seem to go through a minimalistic stage... What I found when I went through my minimalistic stage was there's a massive difference in code weight if only an OS is running... then this changes whenever any additional code is installed... like HAProxy, or Apache or anything else.

Once a single software package is installed, the OS weight difference becomes lost in the noise of the actual code installed.

Also, Ubuntu is the target OS for testing LXD + Docker, so I opted for Ubuntu, as it's one of the most stress tested Distros available.

More eyes on code == Good.

Actually haproxy needs some shared libraries.. For easy of use those are installed by installing their packages. If all need libraries exist in some directory you won't need much.
ldd for haproxy shows:

        linux-vdso.so.1 (0x00007ffc539cb000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f6e96fb8000)
        libslz.so.1 => /usr/lib64/libslz.so.1 (0x00007f6e96d90000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6e96d70000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f6e96d60000)
        libssl.so.1.1 => /usr/lib64/libssl.so.1.1 (0x00007f6e96cc8000)
        libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00007f6e96a08000)
        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f6e96990000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f6e967d0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6e97358000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f6e967b0000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f6e967a8000)

Open in new window

Not that much ... and possible some loadable modules and dependencies if used.
The building of it might need more. The environment to setup iptables etc might need some extra. If all that is not needed, one doesn't need a lot.
Busybox is a swiss army knife providing most of the tools needed in one program. (and that can be static linked).
Lets not hijack the Q on getting the leanest env.

ASKER

Lets not hijack the Q on getting the leanest env.

gladly, thanks.

<OFF TOPIC @david>
just to make it clear : i am not interested in any comment suggesting to use whatever OS or docker, or lxd or whatever... i have experience with all these techs, and i have reasons to do differently. ask me why in a private or different thread if you want. not here.

the most minimal ubuntu is plagued with trivial privilege escalations, one of which found by accident last year by myself and i believe there are dozens more.

my current setup allows to run fully functionnal said apps out of around 50M of RAM with the latest kernel. i expect to trim it down around 30. the smallest dist i ever worked with could boot from an old fashion diskette.
<OFF TOPIC>

---

shared libraries are already handled. i already have something that has been working for months. it is built like a chroot, but with some smarts that handles symlinks, dynamic libs, kernel modules, ... and a bunch of other things recursively. i just want to trim the kernel and rc system further.

for the sake of discussion, i managed to track shared libs but not whatever is loaded at run time

---

>> DNS server --? probably not.
indeed not, unless the role of that machine/container is to be a dns server

>> DNS client ... For DNS client you need the resolver libraries. and /etc/resolv setup. not done by the kernel.
when /proc is mounted, it does work by symlinking /etc/resolv.conf -> /proc/net/pnp
other required files are already setup.

>>Busybox can also mount filesystems (/proc), provide init etc. (i expect you to already use it..).

yes. i run either a single script or a minimal rc system depending on the options.

either way, the rc system merely picks var=val options from the environment which comes from whatever var=val command line options where not parsed by kernel modules.

>>from kernel auto config...

not sure i understand ... ?

>> Busybox also has udhcpc on board to query for them.. using a script like: https://git.busybox.net/busybox/tree/examples/udhcp/simple.script
can help also. That can help re-requesting the IP address... (you may need a DHCP client anyway for prolonged use of IP addresses, depending on setup choices).

yes. works like a charm. the rc script is the callback script as well to keep things in one place. i even have a variant that renews the lease periodically. it is rather useless, though. in my current setup, this kicks in if ip=dhcp is found on the command line. when i use the custom kernel that handles that, the kernel "eats up" the option so the script does nothing.

>> Then there is this for a real minimal startup:
https://www.kernel.org/doc/html/latest/driver-api/early-userspace/early_userspace_support.html?highlight=ipconfig

kinit may proove VERY useful, thanks a lot.
:[[[ how the hell did i initially miss that ^^

i have a variant that just builds the os using the running kernel and modules. seems like a nifty replacement for the suze init or my own rc system. exactly what i was looking for when i started the project. still hoping i can make it's use optional, though.

on the other hand, i don't think it is even work trying to compile anything against klibc but i'd be more than happy to be wrong

>> The ipconfig tool from there can be the helper you need, it also requires /sys/class/net though.

not sure : sys is something i definitely do not want. busybox does manage to display the ip address, routes and the likes without /proc or /sys ; i loosely remember about a hidden flag to the bind syscall.

the kernel does perform the ip/dhcp setup properly. what i miss is a way to grab the dns servers that were returned.

>> Oh nss is a C runtime library construct. probably not the place to look, as it uses /etc/hosts & /etc/resolv.conf as source.

yeah... i configured pretty much all nss options with "file". nothing builtin nss seems to help

thanks for your help

Kernel auto config, the kernel has an onboard module (ipconfig) that can do DHCP/RARP/BOOTP for you that one leaves data in /proc/net
(Kernel level auto configuration: CONFIG_IP_PNP, CONFIG_IP_PNP_DHCP , CONFIG_IP_PNP_RARP ( from makemenuconfig in the kernel source tree),
Network support > Networking options > TCP/IP networking.....
Then the kernel will try to fetch DHCP during boot, and leave them in internal variables that are accessible through /proc & /sys

ipconfig from earlier tools sources uses /sys/class/net (you don't need to use that filesystem through scripts, it is able to just pumpout the values the kernel collected.
i checked the sources on that, the main.c hold everything...).
https://git.kernel.org/pub/scm/libs/klibc/klibc.git/tree/usr/kinit/ipconfig/main.c: lines 25 (location), 162 (dumping the data), 700 (opening the device) are of primary interest.

If you use DHCP you will need to extend the leases anyway, so busybox udhcpc might be the best way enabling activating a script that generates your /etc/resolv.conf (it that exists, nss will also work using queries..., if libresolv.so is available.).

I would keep the klibc away from haproxy etc. klibc is a minimal runtime, most probably lacking all sorts of stuff that might be expected by haproxy et. al.

I am not sure that you want nginx, haproxy etc. as init programs. Then they also need to act as the system reaper of Zombie processes. Better leave busybox init do that.
and abort of the PID 1 processes is a sudden death event for the kernel.

ASKER

Thanks, i will be digging into the code shortly.

I started with udhcpc from busybox. This works.

I followed up using a custom kernel with the options you mentioned. This also works but the dns config is only functional if /proc is mounted.

I have scripts that handle reaping but the target programs are usually event driven and properly written so i do not expect zombies. This is still quite experimental but works fine for now.

I am still striving to do the bulk of the work in kernel. this already works for very simple cases with no dns.

Thanks again, i will dig into this shortly

If you KNOW the DNS servers then you can supply a static resolv.conf... (or even select one from a few using f.e. ping to test reachability), ping with limited TTL specified i mean so they are local or at most 1,2 3 hops away.

ASKER

the dns servers addresses are not static. i would not be bothering if they were ;)
IRL, most of the machines i deal with use static dns or no dns at all. unfortunately, i start to feel the need for dynamic setups using SRV records, early blacklisting, certificate checks, and dns is versatile enough to handle all those needs and others.

at best, i can allow rules such as "dns is the gateway", or possibly x.y.z..253 where x.y.z are the first octets of my own ip.

but all these solutions require some c or shell script to run after the kernel boots and before the main program is run. which makes them pointless since i can acquire dns servers easily in that case. this has already been working for months using udhcpc and is easy to handle with pnp as well ( a trivial solution is to link or copy /proc/net/pnp with /etc/resolv.conf ). kinit or a possibly modified version of kinit will help make things lighter, thanks.

i already also have variants that will even remove any traces from busybox before running the main program. i guess kinit can be modified to unmount /sys and /proc, and rm itself before launching the main program as well.

the issue here is how to get a working dns without such an intermediate program.

i want it light, i want it to boot fast ( <5s in an ESX VM ), i want it to run on microappliances or raspberries for frontal services, ... and i need to show this can run out of comparable resources as docker while being arguably MUCH safer ( for large and poorly designed microservices-based infra that will only scale by running a stupid number of containers ).

... and i also just hate the idea of maintaining and integrating an additional executable or set of scripts and interpreter just for the sake of setting up a nameserver.

ASKER

if i can somehow force the libc to use nameservers from the environment, that would give a trivial solution for pxe setups since i can instruct grub to read the nameservers and add them to the kernel's command line as var=val so they end up in the environment after boot.

The kernel commandline should be able to pass such data when you can pass a commandline to the kernel.
You are free to add some extra parameters for your own setup there... (gentoo passes items this way like real_root= for cryptsetup to allow pivoting to an encrypted container in the initramfs)
Then you still need to parse the kernel commandline for those items. (again requiring some kind of tool for those, and a mounted /proc).
from a program the syscall: mount("proc", "/proc", "proc", MS_NOATIME, NULL) probably will work.
for /sys you can get away with mount("sysfs", "/sys", "sysfs", MS_NOATIME, NULL) i think, unless you also need some config fs (/sys/kernel/config ) or stuff like efivars, (gentoo's /etc/init.d/sysfs might provide some inspiration).

ASKER

Yeah. I was already doing all of that before posting. My busybox solution works like that.

Not sure there is a way to get a working dns without relying on a custom script or prestting dns servers.

Thanks for kinit anyway. This allows me to create a much leaner version ( minus 6M and less bloaty ) that works with an uncustomised kernel which is an interesting tradeoff. I need to add modules autoloading, and optional unmounting of the sys and proc mounts before running init, which seems reasonably easy to maintain. The whole os can be generated in mere seconds. Though unsatisfying, this allows to build the containers pretty much on demand.

--

I guess the answer to my initial question is it cannot be done unless i write a module.

I am keeping the thread open in case something more pops up

ASKER

Btw, you do not need to parse the kernel command line : any var=val option the kernel does not handle ends up in the environment. That works without proc.

Btw, you do not need to parse the kernel command line : any var=val option the kernel does not handle ends up in the environment. That works without proc.

Learned something... Thnx

ASKER

No problem. I guess you see the idea better : the config is crafted preboot and i want to remove as much of the postboot intelligence.

I have running configs using vms and either pxe or on the fly vm creation. I want to setup both a docker alternative for a bunch of microservices, and be able to run these oses on microappliances sitting between internet and a local firewall.

Irl, most of these do not need a working dns, but i would love the ability to use local blacklists and perform runtime configurations using srv records. All this is builtin the target software.

I may be wrong but i believe having, no proc, no sys, no external access except though a single listening port below 1024 on separate appliances or vms brings a security level that is way better than docker containers.

I am unsure selinux would help in this context, so even runinit() seems overkill. I am even considering single user mode. If the kernel crashes when the program dies, all the better ;)

You only need to be sure the kernel will reboot, and not hang in a " disable interrupts; loop: goto loop" mode of operation.
selinux seems overkill here. That is primarily for tightening to options of root. Also it will require setting up security labels on filesystem objects.
Single usermode is something in the init process, not exactly in the kernel.

wrt. you initial Q, i agree then you need a module. With DHCP/BOOTP you still need to extend leases that would require some dhcp client.

ASKER

Thanks

By single user i meant skip compiling multiuser altogether. Not tested yet and i have no idea what that will actually do ;) i guess i will still be able to drop privileges...

Dhcp lease renewal is useless to me. I merely use dhcp on some setups where creating and firering the machines cannot be easily scripted. Each machine will have an infinite lease duration. Most of the time, the addresses will be static anyway.

I am wondering if you know how complex writing such a module would be. I have rather little experience with c and that does not cover writing a kernel module from scratch.

Writing kernel mode code is complex. There are a lot of rules. I havn't done it for Linux/Unix, but i do know OpenVMS. There is a lot of stuff whre you can get away with in user-mode programs..., the slightest error in kernel mode code is rewarded with a crash & halt. and a wade through a process dump if you get lucky.
And a way needed to recover... (although that would be a lot simpler with VM's these days).
Depends on what you want as a learning curve... (might be quite similar to a wall...) depending on time you have to invest of course.
IMHO in your case i doubt it will worth the trouble.

There is literary no difference between single user code / multi user code. For init the numbers 0-6,s etc. are just conventions. The diference is implemented in the scripting run by the init system. (whatever is used for it: open-rc, sysv-init etc.) systemd is in quite a difference ball park, in your case you will want to avoid that swamp.
The OS will transition at one moment from a program to an OS. That is the moment the scheduler starts. Converting programs to become part of the kernel will be tough... there is a tendency to make the kernel less complex by moving stuff to user-land (a process) see fuse modules (file system drivers in usermode). Linus Thorvalts once started Linux because he wanted a monolithic OS, not one like minix with all drivers as usermode code... (and Minix might now be the most popular OS in the world, as EACH Intel X86 chips runs it as SMH mode OS).
Arguably MS-DOS & CP/M (for 8080) are flat OS's (they effectively have no scheduling etc.) and are single tasking.

ASKER

I was meaning something that would barely mount proc or write resolv.conf at boot time. I have no wish to run the programs in kernel. Thanks for all the useful information.

And +1 for asking beastie burn the damn swamp thing ;)

ASKER

update :

writing a kernel module to mount /proc might be impossible as do_mount is not supposed to be available in modules. i have coded the module which is actually simple, but failed to integrate it in the kernel for now. somehow it is selectable in menuconfig but silently ignored during build : the .o file is not generated and the compilation succeeds ^^

on the other hand, i successfully managed to hack net/ipv4/ipconfig.c to create a device with the required information !

 # mknod /y c 253 0
/ # cat /y
#Hello World
#PROTO: DHCP
nameserver 10.0.2.3

Open in new window

the above is what happens in qemu using busybox as init
... if I create /etc/resolv.conf as the adequate node, name resolution works !
i still have to figure out what number i can safely use. for the sake of testing, i used zero and printed the major number in order to test

i am unsure i will pursue with automagic mounting of /proc as i see no point to do it in the kernel now : I cannot think of a use case where /proc would be needed that does not have a different reason to use a userland rc mechanism ( such as using a generic kernel rather than compile a dedicated one which is also a supported mode of operation )

i still have ways to go before i can understand how to make a clean patch ;), minify the kernel properly...

@noci, you were very helpful, thanks again. i am unsure how to close that question yet. i'll try to figure out why my mount_proc module does not work. hopefully i'll come with a definitive answer before i get fed of struggling. i barely ever wrote much C code in my life, nor learnt C, so working with kernel code is something of a hell of a headache.

ASKER

here is my attempt at building a module

if i compile it to .ko, it complains do_mount is not available

i have not managed yet to get it to compile with the kernel

#include <linux/module.h> /* Needed by all modules */
#include <linux/kernel.h> /* Needed for KERN_INFO */
#include <linux/init.h> /* Needed for the macros */
#include <linux/fs.h> 
/*#include "do_mounts.h"     in ./init/do_mount.c */

/*
DOCS
https://www.lynxbee.com/integrating-kernel-module-inside-linux-kernel-source-and-building-it-as-part-of-kernel-compilation/
https://hardikpatelblogs.wordpress.com/2010/11/19/8/
https://www.tldp.org/LDP/lkmpg/2.6/html/index.html
*/

/*
TOSCRIPT

write mount_proc.c , Makefile , Kconfig /tmp/mkinitrd.buildkernel/tinyvm/linux-5.7/drivers/mount_proc
the other 2 files are cat in the below comments

append << obj-$(CONFIG_MOUNTPROC)       += mount_proc/ >> IN drivers/Makefile

add <<source "drivers/helloworld/Kconfig">> in drivers/Kconfig 
*/

/*
$ cat drivers/mountproc/Makefile 
obj-$(CONFIG_MOUNTPROC) += mount_proc.o
*/

/*
$ cat drivers/mountproc/Kconfig 
#
# mount_proc driver as part of kernel source
#
 
menu "Mount Procfs"
 
config MOUNTPROC
        # depends on IP_PNP_DHCP
    depends on BLK_DEV_INITRD
    depends on PROC_FS
        tristate "mount_proc module"
        default y
        help
          mount_proc module.

endmenu
*/

/* the exit function is kinda useless unless we build this as a (pointless) module */



static int __init mountproc_init(void) {
    
    printk(KERN_INFO "mounting proc filesystem... ");
    
    int mnt_procfs;
    
    mnt_procfs = do_mount("none", "/proc", "proc", 0, NULL);
    if (!mnt_procfs) {
        printk(KERN_NOTICE "procfs was mounted on proc2\n");
        return 1;
    }
    else    printk(KERN_WARNING "failed to mount procfs\n");
        return 0;
}

static void __exit mountproc_exit(void) {
        printk(KERN_INFO "Goodbye, world\n");
}
  
module_init(mountproc_init);
module_exit(mountproc_exit);

Open in new window

ASKER

i do not have a proper patch for ipconfig

in case you are interested

int device_init(void)
{
    int i;
    int length = 0;
    
    Major = register_chrdev(0, DEVICE_NAME, &fops);

    if (Major < 0) {
      printk(KERN_ALERT "Registering char device failed with %d\n", Major);
      return Major;
    }

    printk(KERN_INFO "I was assigned major number %d. To talk to\n", Major);
    printk(KERN_INFO "the driver, create a dev file with\n");
    printk(KERN_INFO "'mknod /dev/%s c %d 0'.\n", DEVICE_NAME, Major);
    // minor numbers are used to differentiate multiple instances of a
    // device that use the same driver.
    printk(KERN_INFO "Try various minor numbers. Try to cat and echo to\n");
    printk(KERN_INFO "the device file.\n");
    printk(KERN_INFO "Remove the device file and module when done.\n");
    
    
    length += snprintf(msg+length, BUF_LEN-length, "#Hello World\n");
    
    if (ic_proto_used & IC_PROTO)
        length += snprintf(msg+length, BUF_LEN-length,"#PROTO: %s\n",
               (ic_proto_used & IC_RARP) ? "RARP"
               : (ic_proto_used & IC_USE_DHCP) ? "DHCP" : "BOOTP");
    else
        length += snprintf(msg+length, BUF_LEN-length,"#MANUAL\n");

    if (ic_domain[0])
        length += snprintf(msg+length, BUF_LEN-length,
               "domain %s\n", ic_domain);
    for (i = 0; i < CONF_NAMESERVERS_MAX; i++) {
        if (ic_nameservers[i] != NONE)
            length += snprintf(msg+length, BUF_LEN-length, "nameserver %pI4\n",
                   &ic_nameservers[i]);
    }
    
    return SUCCESS;
}

Open in new window

this initialises the device. the rest is pretty much copy pasted from the kernel chardev.c example

it needs to be called anywhere towards the end of ip_auto_config()

again, most of the code is copy pasted from examples

obviously, i need some more error handling

Maybe you need something else.... much simpler Like a character driver that returns the contents of /etc/resolv.conf (say major M, minor 1)
then mknod M 1 /etc/resolv.conf (or from a restore in the CPIO initramfs archive... )
Not exactly what you started in the above example... but close:
You can present data collected from ipconfig in a userland compatible/acceptable form
/dev/zero can be an inspiration. (that returns buffers of zeros)... /dev/null (immediate end of file).
The buffer can be established & filled DURING loading/init....
In your case: establish a pointer on open() to start of the buffer, containing all reads return what is left vs. what is asked.. after EOF return EOF/ERROR whatever is applicable.
(There is no requirement for devices to be in /dev, it is customary).

ASKER

that is more or less what i am doing.

but i am focusing on the variant which is fully in kernel. i can create the devnode in the initrd. it works.
thepointer is created during init rather than open because the data will never change.
the open func merely sets the pointer address to the buffer created during init.
i am unsure about concurrency but this is not really an issue... yet

... am i missing something ? or did i just neglect to post ALL the relevant code ? sorry if that is the case

i am currently creating the devnode directly in /etc/resolv.conf which should help making the whole think less breakeable imho but that seems debateable

this currently seems to run nginx as init properly with a working dns. i am still exec()ing nginx using busybox but i believe the kernel ought to be able to run it directly as init

I probably misunderstood the the sentence "copied from chardev.c example" to include the remainder of the code untouched.
Concurrency should not be an issue. (data is read only & static, the file position is in an in memory copy of the Inode.
(the read is probably done in one go anyway..) there should never be more than one process active within one of the dev calls at any moment.
The devnode can be anywhere, no need to be in /dev/...
you may need to verify /etc/ld.so.conf

ASKER

hi noci, i had less time to spend on that project than i hoped for but you may be interested in updates and i value your insight... if you have no further interest in this, please let me know and i will close the thread. likewise, if anyone is interested...

- not compiling multiuser seems to be a no-go : the doc clearly states not having this module does not allow to drop privileges. i am unsure there is anything interesting to find down that road.

- i did initially keep the rest of the chardev.c code verbatim except for the posted device_init func and the not posted device_open func that maps "msg_Ptr" to the "msg" i created during init. i am now testing a variant without try_module_get/module_put calls. no difference regarding anything pertaining to what we discussed in this thread.

- i decided to use chardev with major 53 which happens to be mapped to something i never expect to be used concurrently if at all nowadays. i am unsure this is the way to go but heck, that works.

- concurrency is indeed an issue : the default code allows a single reader at a time. this introduces security concerns as a malevolent user may block other reads by just keeping the file open. i can quite easily handle multiple readers for minimal memory cost since the message won't change so i only need to maintain a pointer per reader but the number of concurrent readers would still be finite. this produces 3 options

1- keep the existing code and consider locking down dns queries is not that much of an issue

2- make it work concurrently. i see little point in doing this since the libc has caches and a malevolent user who can open the device once will likely be able to open it as many times as required to block things. on the other hand, that can prevent a legitimate program from accidentally locking the dns down.

3- write an actual file with the adequate contents, possibly directly in resolv.conf. this raises a whole lot of don't do this flags but works.

- i still have not encountered any major issue running no regular init program. i have not tried to produce zombies voluntarily, though. i will at some point if anybody is interested.

- ld currently works verbatim and i don't have an ld.so.conf. i already tried non standard paths. this works by setting LD_LIBRARY_PATH anywhere in the environment including in kernel parameters. interesting tip, though that allows to separate the boot environment vars from something that is proper to the generation of the filesystem. i believe using standard paths is simpler, but all libs in a single location is an idea.

- i am workiing on removing as much useless bloat as possible from the kernel. currently, i'm solely focusing on useless drivers and skipping compilation hacks. i would like to produce a minimal config with enough to drive a small diskless appliance or raspberry, and configs for various virtualized diskless targets ( currently bhyve and vmware). feel free to chip in if anyone has insight. i will open a separate thread if required.

thanks for the help and hope i'm not wasting anyone's time

Major unit number is "not an issue" as long as you don't choose one that is well known for something else.
AFAICT there is no wild guessing software trying random major numbers, creating a device and using it...
I see 2 options there: either a) map /etc/resolv.conf to the C-dev. or b) map it somewhere else and use cp to copy it to. /etc/resolv.conf. (likely requiring a shell script as init).
a) is usable for simple containers only running one thing.
b) is usable to be safe for multiple use, might be an option if you have an init script anyway.
LD_LIBRARY_PATH will work as well it does require a script on startup.
ld.so.conf is a static thing you can put into an initramfs image.

(i read multiuser as the init 3-5 modes.., not the kernel support..., without kernel multiuser you only have the root user => no security).

wrt. wasting time... this is Hacking... walking the borders of intended use. Always interesting.

ASKER

Major unit number is "not an issue" as long as you don't choose one that is well known for something else.
AFAICT there is no wild guessing software trying random major numbers, creating a device and using it...

indeed, lol : that was only test code to check i managed to create a device without digging too much.
note : it is actually likely feasible to let the system determine the major and create the device in kernel code but i see little point in doing that.

I see 2 options there: either a) map /etc/resolv.conf to the C-dev. or b) map it somewhere else and use cp to copy it to. /etc/resolv.conf. (likely requiring a shell script as init).
a) is usable for simple containers only running one thing.
b) is usable to be safe for multiple use, might be an option if you have an init script anyway.

i ended up creating both a dev node with major 53 AND an actual file in /etc/resolv.conf.pnp.
the intended use is to symlink /etc/resolv.conf -> /etc/resolv.conf.pnp
pretty sure neither would ever make their way mainstream but i hope i'm wrong ;)

the test code that writes the file is here. it clearly needs some improvements, but it does create the file as expected

static void write_file(char *filename, char *data , size_t len)
{
    struct file *file;
    /*size_t len;*/
    
    /* MISSING error handling but no idea how to handle errors anyway 
    WE NEED TO BE ABLE TO DETECT WHETHER THERE IS A FILESYSTEM
    I SUSPECT filp_open RETURNS FALSE BUT IT MIGHT ALSO CRASH
    */
    
    /* newer gym. starts working somewhere along 4.x branches 
    note : vfs_write works without old_fs() stuff
    */
    printk(KERN_INFO "write_file(): file=%s len=%d\n", filename , len);
    file = filp_open(filename, O_RDWR | O_CREAT, 0644);
    printk(KERN_INFO "write_file(): opened file=%s: pos=%d\n", filename, file->f_pos);
    /*vfs_write(file, "xx", 2, &file->f_pos);*/
    len = kernel_write(file, data, len, &file->f_pos);
    printk(KERN_INFO "write_file() wrote %d bytes into %s, new pos=%d\n", len , filename , file->f_pos);
    /* check num returned bytes ? */
    return;
    
    /* older gym. not sure this works verbatim
    * but something along those lines did */
    mm_segment_t old_fs;
    loff_t pos = 0;
    old_fs = get_fs();  //Save the current FS segment
    set_fs(KERNEL_DS);
    vfs_write(file, data, 40, &pos);
    /*file->f_op->write(fd, data, 2, &file->f_pos);*/
    filp_close(file,NULL);
    set_fs(old_fs); //Reset to save FS
    kfree(filename);
    kfree(data);
}

Open in new window

this raises one extra possibility which is to mount /proc temporarily and copy the file within a tiny kernel module. there is no real advantage to this approach except it can fit in a self-contained module rather than hack existing kernel code. i have an existing module that mounts /proc so the task in itself seems rather trivial at this point.

i'm posting the code in case you are interested. the exit() function seems useless when the module is compiled in. i think it can be omitted entirely.

buildkernel_patch_mountproc(){
    
    echo "builkernel_patch_mountproc(): create drivers/mountproc"
    mkdir -vp drivers/mountproc || return 1
    
    echo "builkernel_patch_mountproc(): write drivers/mountproc/Makefile"
    echo 'obj-$(CONFIG_MOUNTPROC) += mountproc.o' | tee drivers/mountproc/Makefile || return 1

    echo "builkernel_patch_mountproc(): write drivers/mountproc/Kconfig"
    tee drivers/mountproc/Kconfig <<KCONFIG >/dev/null || return 1
menu "Mount Procfs" 
config MOUNTPROC
        # depends on IP_PNP_DHCP
    depends on BLK_DEV_INITRD
    depends on PROC_FS
        tristate "mountproc module"
        default y
        help
          mountproc module.

endmenu
KCONFIG

    echo "builkernel_patch_mountproc(): write drivers/mountproc/mountproc.c"
    tee drivers/mountproc/mountproc.c <<MOUNTPROC >/dev/null || return 1
#include <linux/module.h> /* Needed by all modules */
#include <linux/kernel.h> /* Needed for KERN_INFO */
#include <linux/init.h> /* Needed for the macros */
#include <linux/fs.h> 
/*#include "do_mounts.h"     in ./init/do_mount.c */

static int __init mountproc_init(void) {
    
    printk(KERN_INFO "mounting proc filesystem... ");
    
    int mnt_procfs;
    
    mnt_procfs = do_mount("none", "/proc", "proc", 0, NULL);
    if (!mnt_procfs) {
        printk(KERN_NOTICE "procfs was mounted on /proc\n");
        return 1;
    }
    else    printk(KERN_WARNING "failed to mount procfs\n");
        return 0;
}

static void __exit mountproc_exit(void) {
        printk(KERN_INFO "Goodbye, world\n");
}
  
module_init(mountproc_init);
module_exit(mountproc_exit);
MOUNTPROC

    echo "builkernel_patch_mountproc(): append drivers/Makefile += mountproc/"
    grep mountproc drivers/Makefile \
    || echo 'obj-$(CONFIG_MOUNTPROC)       += mountproc/' | tee -a drivers/Makefile >/dev/null \
    || return 1
    
    echo "builkernel_patch_mountproc(): edit drivers/Kconfig source \"drivers/mountproc/Kconfig\""
    grep 'drivers/mountproc/Kconfig' drivers/Kconfig \
    || sed -i.tmp -e 's:^endmenu:\nsource "drivers/mountproc/Kconfig"\n\n&:' drivers/Kconfig
    grep 'drivers/mountproc/Kconfig' drivers/Kconfig || return 1

    # TODO: this does not check the option actually exists
    
    ./scripts/config --enable CONFIG_MOUNTPROC || return 1
}

Open in new window

wrt. wasting time... this is Hacking... walking the borders of intended use. Always interesting.

yeah. actually i have had a working variant for months using busybox. i did not expect to be able to write working C code in a reasonable amount of time. turns out hacking kinit is rather easy, and writing a kernel module is much easier than i expected. unfortunately, what i lack most is actual C knowlege and i'm quite afraid to even test my hacked kernel on a real production server. ... but this at least demonstrates this is all feasible.

thanks for your help. feel free to bounce ideas and/or review my C code.

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

A "thank you" is always nice to get.... just like having someone smile.