asked on

difference between mmap and malloc/valloc + mmap fixed

hello all

i'm trying to modify an existing application in such a way that a line appears to be prepended to all the files opened through a function that either loads the file into a buffer or mmaps it depending on it's size.

the function returns a struct that contains the memory adress where the file is mapped and it's length.

only reads are expected, so there is no need to bother about changing file lengths

---

obviously the buffer case is trivial and is already solved, but i have concerns regarding the mmap case.

the solution i came up with consists in allocating more space than needed using malloc/valloc, perform an mmap fixed to the first page boundary inside the allocated space, write the string before the mmaped first address and return a pointer to the beginning of the string.

i also tried to use an mmap anonymous bigger than needed instead of a valloc but ended up with segfaults.

---

i'm concerned with performance issues. i'm not a C developper ( this is one of the first C programs i toy with ) but i do program in other languages and can read C and understand what the code does. unfortunately i'm not really confident with myself espetially when it comes to memory management so i'm in need of a deeper understanding of what mmap and malloc do and determine if my solution is workable and/or find a better one.

the program is a custom imap server that i want to toy with in order to migrate away from it. i need to move about 100Tb of data in an unwknown number of files ( i guess around 10^8 ). it compiles on old 32bits debian versions and i'd rather not even attempt to port it to anything newer, or toy with other parts of the code.

here come my not really exhaustive questions

do i need to malloc the whole file's length before performing the mmap, or can i just map a few pages ?

what would be the difference in terms of memory consumption between a regular mmap and my current solution ?

can anybody confirm that the malloc will not actually write anything nor reserve ram space ? ( i'm obviously expecting virtual space to be reserved )

is there a better solution ? maybe something to produce a "glue" buffer that would actually be a buffer continuated in an mmap without that much toying ?

any other comments ? ideas ?

thanks for your time

skullnobrains

ASKER

i'm currently experimenting with this construct which seems safer

memfile->addr = mmap( NULL , length + getpagesize() , PROT_WRITE,MAP_SHARED|MAP_ANONYMOUS, NULL, 0);
mmap(memfile->addr + getpagesize(), length, PROT_READ,MAP_SHARED|MAP_FIXED, fd, 0);
memfile->addr = memfile->addr + getpagesize() - strlen ( mystr ) ; /* TODO check about the trailing \0 */
length = length + strlen ( mystr );
strncpy (  memfile->addr , mystr , strlen ( mystr ) - 1 ); /* not sure about the length, here */

Open in new window

in replacement of

memfile->addr = mmap(0, length, PROT_READ,
			   MAP_SHARED, fd, 0);

Open in new window

what do you think ?

sarabande

allocating more space than needed using malloc/valloc

don't see any malloc/valloc in your code snippet. if using mmap with input address 0 the OS will safely allocate memory of the requested length rounded up to page size. it would free the memory if you call munmap with the address returned from mmap and length. apparently you would use either malloc or mmap but not both.

i'm currently experimenting with this construct which seems safer

don't see why a complex sequence of 5 statements with a lot of flags - some of them incompatible to each other - should be safer than one simple statement which does exactly what it should do, to map the begin of an opened file to the process memory and let the OS choose where to allocate. also the code sequence seems buggy as you were overwriting the mapping address returned by mmap and were not able to call munmap and free the allocated memory because of that.

see http://beej.us/guide/bgipc/output/html/multipage/mmap.html to get an idea what the mmap can be used for and what should be avoided.

the segmentation faults you got are likely because you requested read-only memory with mmap and then try to copy a string to the protected buffer.

Sara

skullnobrains

ASKER

i solved the segfaults which is why i can now use a twin mmap construct rather than a malloc/valloc + mmap.
actually you were right : i had the wrong flag with the map_anon. the snippet i provided in my last post does work more or less as expected and does not segfault.

the snippet with valloc was pretty much the same replacing the first line with
memfile->addr = valloc( length + getpagesize() );

i cannot use a single mmap, because i need to return a pointer to a memory zone that contains a specific line BEFORE the beginning of the file.

using mmap with first arg null or zero is pretty much a no go because i'd need to create a buffer in the few bytes BEFORE the returned call and the memory zone will not always be free which is why i attempt to reserve the file length + at least the size of the prepended line before calling mmap

---

if you know a better way to do this i'm all ears knowing that the function i'm modifying returns a struct containing a memory address and a length, and i need to return the same thing but lure the rest of the app into believing that the file contains the concatenation of a header line and it's actual contents without modifying the file.

i have not experimented yet with negative offsets, but i assume they will either not be supported or trigger a segfault if i try to write in the space before the actual beginning of the file.

many thanks for your time

skullnobrains

ASKER

my question does not seem very properly phrased, so here is what i need
[X] represents a memory address for future reference

the current codes produces

[A] <mapped file>

Open in new window

and returns [A] as the address and the file's length

and i need to change it to

[A] < text string > [B] < mapped file >

Open in new window

and return address [A] and the sum of the string's and file's lengths

sarabande

it already was sufficiently phrased. my problem is that i am not quite sure whether i should support your approach cause my guts feeling says that you better would go with a separate buffer where you copy the relevant parts into. as far as i understood your requirements you already have implemented this part for smaller files. if so, i don't know why you mean that a mapping would be the better choice for bigger files. the memory you requested was 'length + getpagesize()'. if length was the size of a structure it is a constant size and independent of the file size. so, the mmap actually would map one page size of the file what is 4k normally. if that is correct, i don't understand why you don't use a fixed buffer on the stack which is sized 'length + page size' and first copy the text to it and then read first page of the file (or less if the file was smaller than a page) to the buffer such that the two parts were concatenated.

char buffer[sizeof(MyStructure) + 4096] = { 0 };  // buffer on the stack
...
strncpy(buffer, mystr, min(strlen(mystr), sizeof(MyStructure))); 
struct stat fs = { 0 };
if (stat(myfilepath, &fs) == 0)
{
       FILE * pFile = fopen(myfilepath, "rb"); // read binary
       if (pFile)
       {
              int nread = fread(&buffer[strlen(buffer)], 1, min(4096, fs.st_size), pFile);
              if (nread > 0)
              {
                      // success

Open in new window

Sara

skullnobrains

ASKER

i am not quite sure whether i should support your approach cause my guts feeling says that you better would go with a separate buffer where you copy the relevant parts into

i pretty much agree with your concerns and also believe that in this specific case, reading the file incrementally would be much simpler.

actually in the case of a buffer, my code is pretty much the same as the one you posted. but some files can be huge so i was assuming the mmap helped saving memory (am i wrong there ?). the existing code will use a buffer up to some configured size (32k i recollect), and mmaps for bigger files

---

length is actually the length of the file, sorry if that was not clear

actually the code i provided seems to work fine with a couple tweaks

     memfile->addr = mmap( NULL , length + getpagesize() , PROT_WRITE,MAP_SHARED|MAP_ANONYMOUS, NULL, 0);
      mmap(memfile->addr + getpagesize(), length, PROT_READ,MAP_SHARED|MAP_FIXED, fd, 0);
      memfile->addr = memfile->addr + getpagesize() - strlen ( mystr ) - 2 ; 
      length = length + strlen ( mystr );
      strncpy (  memfile->addr , mystr , strlen ( mystr ) );

Open in new window

i allocated a whole extra page just to write the string which is not very smart but i don't see an easy way to allocate less memory while being able to do the mmap fixed on a page boundary

the second mmap call actually removes previous mappings so i would assume my code can hardly be worse than the existing one

what i'm concerned with right now is about unmapping this stuff properly : it is quite unclear to me whether unmapping the address of the first page will remove the mapping on the next page or not.

thanks for your help

alex

sarabande

but some files can be huge so i was assuming the mmap helped saving memory (am i wrong there ?).

your current code maps only the first page of the file regardless whether it was huge or small. you could increase the length argument to map a bigger portion of the file but you can't map more as there is contiguous free (heap) memory if you let the OS decide at which address the mapping should begin. if you choose the address yourself the code is no longer portable and have to exclusively reserve the memory space with malloc (heap memory) or by using a sufficiently big buffer at the stack. by using heap memory, the maximum space may vary since you need contiguous space and free heap memory is going to get divided into smaller pieces with the lifetime of your process. on a 32-bit platform with 4 gb memory you could try to using the 4th gb which was not used by the 32-bit windows. however, that is not easy to accomplish and rarely is an option for a normal windows application.

can you explain why you want a pointer to a buffer that contains some individual data concatenated to the file contents of files of arbitrary size? what is the maximum size possible? are the files text files? if yes, you should know that file mapping would point to the binary contents of the file which may include a BOM mark at begin and have a pair of carriagereturn-linefeed (CRLF) for each line-wrap while reading from textfile would skip the BOM (if existing) and turn each CRLF to a single linefeed. you can read in binary mode (as in my code) but actually it is not really a string that can properly displayed if you do so. generally, if you want to handle big texts which may not fit into heap memory, you better handle them as an array of text lines (strings or pointers to char). then each line needs contiguous memory but not all the text what could increase the amount of total text hold in memory tremendously. you also can simple add lines at the begin of the array and can process the array in chunks - say 1000 lines are in memory, the next 1000 lines will be loaded if needed.

Sara

sarabande

what i'm concerned with right now is about unmapping this stuff properly :

for each call to map where you passed 0 as first argument and a length value > 0, the os will allocate memory at the heap (most likely) which size is the requested length rounded up to a multiple of pagesize. you have to free the memory by calling munmap passing the address that was returned by the corresponding mmap and passing the same length as requested.

because of that you may not modify the variable where you stored the returned address beside you save the address somewhere else.

the second mmap call actually removes previous mappings

don't think so. each call to mmap establishes a new mapping and the only effect of the second mapping is that you use a fixed address (derived from the address you got from first mapping) and therefore maps the second page of the file. since you don't store the returned address and the flags passed with the call are irrelevant for write-protected mapping, i would say that the second mapping has no effect beside of an entry to mapping table.

the strncpy however, would write a string to the first page contents and overwrites the existing contents of the first page at this part. as the mapping is read-only it is not relevant for the file but for any other mapping done at the same file page. i don't know what you intend with the code but would assume it is wrong. also strncpy probably is wrong since strncpy would write a terminating zero character at end of the string in the target buffer what probably overwrites the first character of the second page. you have to use memcpy if you want to overwrite existing file contents.

Sara

skullnobrains

ASKER

Hmm, this is a lot of useful information, but part of it does not totally apply unless i m missing something :

Length is initially populated with the file s length, and the second code snippet does work as expected now.

The second mapping uses the address returned by the first and adds one page so unless i m missing something, it should remap all of the pages allocated by the first mmap except for the first page.

I can properly unmap because the anomymous mapping first adress can be deduced by flooring the returned address using the page size as the base. Addr - addr modulo page size.

The servers are old debian. A mixture of woody, potato and sarge.

---

Regarding the details you asked for

The app is an imap server. The files are text files. Line endings are normalised in a different part of the code.

I m hacking it so it prepends an extra header that contains the pop uidl so pop users dont get to redownload their email after the migration which will be performed using imapsync or a similar possibly hacked tool.

During the migration process, the server is able to display a fake combined inbox with the new and legacy contents which are fetched on demand over imap and indexed locally. Likewise, imap users are presented a legacy virtual folder which is mapped to the old platform.

Thanks again for your help

skullnobrains

ASKER

I forgot to add that i really do not want to change the files contents

sarabande

Length is initially populated with the file s length

didn't you say that length is the size of a structure in your initial post?

if length is the file size and mmap returns a valid address you were be able to access the whole file contents like an array. as you used the SHARED flag, the OS would do dynamically swapping if some pages of the buffer are not yet loaded. you might get problems if you do an operation on the buffer that would update more than one page of the underlying file.

The second mapping uses the address returned by the first and adds one page so unless i m missing something, it should remap all of the pages allocated by the first mmap except for the first page.

you were using the MAP_ANONYMOUS flag for the first mapping. this flag actually makes that the mapping isn't bound to a file (fd and offset are ignored) and you get a pointer to shared (and zeroed) memory of the requested size. the memory was reserved in the virtual address space but can be swapped, what means that pages not yet loaded may be read from swapfile at runtime. the second mmap now tries to map the file to the second page of the shared memory and you say that you assume that the second mmap would do a remapping by this since the first mapping was removed.

i read the docs at http://www.gnu.org/software/libc/manual/html_node/Memory_002dmapped-I_002fO.html. and indeed they say

Any previous mapping at that address is automatically removed.

where 'address' refers to the address returned by a previous mmap. as you said that your code was working, you are probably right that the second mmap would do some kind of remapping, even if the address passed (and probably returned) was not exactly the same as the one returned by the first mmap. i wonder whether this always would work and whether the memory reserved by the first call still was reserved after you do the remapping. there are three possible risks: for one, the first page reserved by the first mmap is no longer included in the area reserved by the second mmap. hence, the os could use the page for other purposes. for two, if the first mapping was removed, the reservation for the whole mapping was freed. if the os or a concurrent thread needs a new reservation, it is not impossible the just freed memory was used for this purpose. if so, the second mapping probably would fail (what you don't check) and that would mean that there is no reservation and no mapping. for three, i could think (and didn't find different in the docs) that a mapping with MAP_FIXED is likely to make not a reservation of memory at all since it was assumed that the address given already points to allocated memory. if so, the mapping would happen at non-reserved memory what would crash sooner or later.

so it prepends an extra header that contains the pop uidl

how do you provide the 'hacked' mail as a file or as a buffer? if the second, you have to pass very big files in chunks anyhow. so why not pass the extra header and then the rest page for page? if you pass it as a file i would assume that a temporary file was created. if so, you rarely have any advantage by the complex mappings but simpy could write the header and then copy the file contents in big chunks. you even could use file mapping to read the chunks very effectively.

Sara

skullnobrains

ASKER

didn't you say that length is the size of a structure in your initial post?

not quite, i mentionned a struct that contains the length and initial address, but that was probably poorly phrased, sorry.

access the whole file contents like an array. as you used the SHARED flag, the OS would do dynamically swapping if some pages of the buffer are not yet loaded. you might get problems if you do an operation on the buffer that would update more than one page of the underlying file.

ok, thanks a lot, i think i get it a little better.

i'm not really concerned about reading multiple pages at once as the existing code apparently handles that properly. can you confirm that in such cases the code would rather break than return junk data ?

as you said that your code was working, you are probably right that the second mmap would do some kind of remapping, even if the address passed (and probably returned) was not exactly the same as the one returned by the first mmap. i wonder whether this always would work

it does work, and i found a doc that explicitely states that all of the matching pages are remapped, so i would assume this is safe.

i can confirm that the returned address is always the one that you pass when using MMAP_FIXED according to my tests. the doc tells likewise but i find it rather unclear.

you're right, i'm very concerned with unmapping the first page. given the complexity of the code it is quite a pain to trace the unmap. what is worse is that unmap is quite happy with unmapping things that are not mapped, and i'm pretty sure the return is not handled

i wonder whether this always would work and whether the memory reserved by the first call still was reserved after you do the remapping

i'd assume yes, but i don't really know how to determine if that is the case for sure. what i can confirm is that writing to this memory segment after the second mmap does not segfault.

for two, if the first mapping was removed, the reservation for the whole mapping was freed. if the os or a concurrent thread needs a new reservation, it is not impossible the just freed memory was used for this purpose

ouch ! i thought this operation would be atomic but i must admit i did not check. do you have a reason to believe otherwise ?

btw, don't worry, i'll most definitely add decent error handling before i commit the code. but i still have problems with other parts of the code (and actually very basic C programming stuff), and i'd like to lift all the concerns mentionned in this thread.

for three, i could think (and didn't find different in the docs) that a mapping with MAP_FIXED is likely to make not a reservation of memory at all since it was assumed that the address given already points to allocated memory. if so, the mapping would happen at non-reserved memory what would crash sooner or later

yes. i perform the first mmap in order to reserve the memory. am i missing something here ?

i you're interested, i played with mmap_fixed a little, and if you don't reserve the memory in the first place, it segfaults as expected (unless you're very lucky, of course)

should i understand that the second mmap may unreserve the memory initially reserved by the first one ? in that case would a malloc/valloc make a difference ?

how do you provide the 'hacked' mail as a file or as a buffer? if the second, you have to pass very big files in chunks anyhow.

i think neither :

when the file is small, the existing code mallocs and loads the file in a buffer.

when it is bigger, it mmaps the file

in both cases, the function i'm working on returns a struct that contains an initial memory address and a length.

there is a wealth of different functions that use this struct and access the memory directly. if i were to write the program i'd probably go for a simple fopen and sequential reads and writes, but this code seems to index the memory segment in order to locate headers and body parts, and then accesses those parts randomly. porting it to fopen/fread/fseek calls is more than i can handle. i'd rather write my own minimal imap server in php or python than try to go that way.

i'm unsure what happens in the case of write operations, but fortunately i don't need to deal with them.

... so i don't really get to "pass" any actual data :

there is no temp file and i cannot write to the disk anyway both for performance and security reasons. i'm dealing with 100 terabytes or so, so i cannot afford to create temp files for read operations. if i could, i'd run a simple shell script and prepend the header to every file, rather than toy with tens of thousands of lines of barely compiling code in a language i'm not comfortable with ;)

i assume in this case the chunking is actually performed by the kernel, using the magic beneath mmap2 (aka the file is actually accessed by chunks of page_size, with dirty buffer being asynchronously written to disk whenever the buffer cache or that specific memory segment gets flushed )

--

thanks again. you're being very helpful including by bouncing ideas and stating concerns ( some of which i had missed )

skull

ASKER CERTIFIED SOLUTION

sarabande

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

skullnobrains

ASKER

ok, i think i get it.... mostly ;)

if i understand correctly, the malloc/valloc would produce non-swappable memory while the mmap produces swappable

i have a little concern with the idea of reading the whole file into an mmaped memory segment : there are many cases in which only the headers are needed, so i'm afraid i'll put quite a lot of useless strain on the disk ( which is an nfs mount ). actually i guess i'll add MMAP_NORESERVE flags so the virtual space is not even reserved before it is actually used (but the program may crash if there is no physical memory available).

i think i'll add some error check and possibly a retry on the 2 mmaps so if it is actually possible that something gets mapped at the wrong place while the second mmap is performed.

i looked a little bit into this issue, and found little to no information unline, nor a source code i can understand for sure, but various sources would indicate that the mmap is atomic on unices but not on windows

if you're interested, see for example this thread :
https://code.google.com/p/nativeclient/issues/detail?id=1848

Linux/Mac's mmap() syscall can atomically replace a memory mapping. However, Windows does not (as far as we know) provide an API for doing the same. Consequently, on Windows, NaCl's mmap() call works by unmapping pages from an address and then remapping at this address.

anyway thanks again for your precious help

best regards

skull

skullnobrains

ASKER

not an actual "solution" but all the help i was expecting.
great and detailed comments, some of which i assume required quite some research, and some time to think through...
thanks for your time !

sarabande

i looked a little bit into this issue, and found little to no information unline,

a good idea what is possible and where you have to take care is to look at the possible errors you could get.

if none of those errors actually is very likely to occur since the description doesn't fit to your case, and if you also avoid code which was described as being "undefined" or "unpredictable behavior" you are rather safe.

i have a little concern with the idea of reading the whole file into an mmaped memory segment

it is easy and safe. two strong arguments. if you got performance issues, it may be the easier way to provide two buffers, one for the pop header and one with the mapped file, and let the later functions handle the case where they have to decide whether a pop header is needed or not.

there are many cases in which only the headers are needed

if so, the following functions much easier should be able to handle the pop header in an extra buffer rather than you can provide a mapping which provides an extra page at begin.

good luck.

Sara

skullnobrains

ASKER

luck was great : seems to work like a charm, and i even managed to unmap the segment properly.

my only regret is to use up 4096 bytes for a 34 bytes-long string. i have no idea how to do better when using mmap.

it is easy and safe.

actually, i did some tests, and the difference is HUGE : the current code will reserve memory but not allocate it unless it is needed. when reading big files, the system is even able to swap out ( or actually simply destroy ) already read pages. when reading headers, only 1 or 2 pages are read since prefetch is disabled.

note that my personnal feeling on the matter is that a regular fopen would have been MUCH better given the use case, and even better if/when caching most of the data found in headers in a local .db file.

--

i understand why you suggested to store the header in a separate buffer. it is obviously better, but i feared to encounter too many problems given the complexity of the existing code. anyway it was the better suggestion. just not feasible safely and in a reasonable amount of time at my current level of knowlege given the existing code. it would have been simpler for me to write a complete imap server using the same .db format as the existing one to get the flags in php or python than to do this

best regards

sarabande

i have no idea how to do better when using mmap.

there is no way else. mapping means to map pages and the minimum is one page.

good luck.

Sara