• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 504
  • Last Modified:

Reversing large character strings efficiently in C++

If we have character string as large as few Gigabytes( take for example 4 GB), whats the most efficient way to reverse it using c++ from time and space point of view? Basically assume that a function will recieve this string as input parameter and the same function should returned reversed string. Performance/speed is the top criteria here. I undertand generally we can use recursive functions, third party libray api's etc for string reversal but in situations where strings are few GB long, that approach may not work. I searched but didnt find any relevant post in this forum. Any ideas? Thank you.
0
James Bond
Asked:
James Bond
  • 5
  • 3
  • 3
  • +3
6 Solutions
 
JIEXACommented:
I suppose that the string is a contiguous byte array in virtual memory of a process, and you know its length.

There are some thoughts that can improve performance.
1. Accessing 4 single bytes is worse than accessing single 4-byte integer. In the case siring length is 4*k bytes, switch integer pointers, and then switch bytes in these integers.
2. Cache lines and TLS are, probably, not issues because in a simplistic way you work sequentially.
3. MMS instruction set might give some additional value.
0
 
UltraDogCommented:
It depends on what you want to do with the string.
You can keep it in the original byte order, but provide interface methods which will read or manipulate the string backwards as you wish.
0
 
James BondSoftware ProfessionalAuthor Commented:
JIEXA,

Assume that we dont know string length. You might be knowing string terminator charatcter like say '$'. Any idea how to handle this scenario?

Also to help me further can you please elaborate on MMS instruction sets? I didnt find any links on web so you can share those if you have any. I would basically like to know a psuedo algorithm to handle this case.

To repeat, i want to take this string( or a stream of characters) as an input and return reversed string as output.

Thanks.
Mumbaikar
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
evilrixSenior Software Engineer (Avast)Commented:
Have you considered just trying the standard C++ reverse algorithm?
http://www.cplusplus.com/reference/algorithm/reverse/

Complexity
Linear: Calls swap one half the length of range [first,last) times.
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> I undertand generally we can use recursive functions, third party libray api's etc

Generally the simplest way (and this is how the STL algo works) is to start at each end and swap the first and last (N) char and then move in one at each and a swap the 2nd and one before last (N-1) char and so on until you meet in the middle. There is no additional space required (apart from a single temporary char for the swap of each character pair) and the complexity is linear by half the length of the string O(N/2).
0
 
HappyCactusCommented:
Use the string reversely, instead of changing the ordering.
For example, when printing, use reverse_print() like this:


void reverse_print (char *x, int len, FILE *file)
{
 char *p = x+len;
 while (p >= x) {
  fwrite(file, p, 1);
  --p;
 }
}

Open in new window

0
 
JIEXACommented:
Basically, I see no better way -- algorithmically -- to do other than than (see below the code).


The searching for terminator will evict cache lines one after another, but I see no better way to find out the end.

Sorry, that was MMX and SSE instruction sets, not MMS. But again, there is no algorithmic improvement.

void reverse_str(char * s, char sep)
{
  size_t i,len;
   char * end = strchr(s,sep); /* we suppose no zero bytes allowed */
   if (end == NULL) return; /* no separator -- error might be needed */
   len = end - s;
   for(i=0;i<len/2;i++)
   {
      char tmp = s[i];
      s[i] = s[len-i-1];
      s[len-i-1] = tmp;
   }
}
(code not tested).

Open in new window

0
 
JIEXACommented:
The main question you need to ask first is how this string was put there to the memory. This way you can probably find its length significantly faster, and then you can use either method for reversing that experts offered (STL etc).
0
 
JimBeveridgeCommented:
The performance of your algorithm is almost uninteresting compared to the performance of main memory. Access to main memory is SLOW. That's why there's an L1 and an L2 cache. The amount of time it takes to read and write the L1 cache to/from main memory is so large that the time it takes you to reverse the bytes is almost uninteresting.

So the absolute theoretical best you can do is memory subsystem transfer speed times the amount of data divided by two (read/write). Start with the naive approach (single threaded, STL reverse) and see how the result compares to the theoretical best. Don't bother optimizing anything until you've done this.

You can learn more about cache hits, transfer speed, etc using Intel's vTune.

If you need to optimize, start by spreading the load across all of your cores using multithreading. My guess is that this will completely saturate the memory bus and your algorithm will be a non-issue.

If you must optimize (for example, because you can't multithread the system), fill a processor cache line from the source data. Don't overfill because staying within a cache line is critical. You may need special handling for the buffer's beginning and ending bytes, depending on buffer alignment.

Reverse the data in the cache line. Again, I think that the algorithm probably doesn't matter much compared to the cost of an access to main memory. STL reverse is a good place to start.

Finally, write the data to the destination buffer as int64 data to move 8 bytes at a time.
0
 
James BondSoftware ProfessionalAuthor Commented:
I have never done this before so it would be great if anybody wants to explain how to use L1/L2 cache on demand or you can direct me to any relevant links where I can understand it. How do I know whats max size of L1/L2 cache on my machine?

Regarding manipulating this 4GB string, how do we store this string - which memory do we use? STL reverse algo is good but standard hasn't defined max size of string in bytes. I was able to store somewhere around 100000000 chars in STL string on my machine which is 32 bit machine. I am still testing this thing with some algorithms given above including STL reverse. It will take me some time to accept any answer till then. Meanwhile any comments on my above questions will definitely help. Thanks gyus.
0
 
JIEXACommented:
If your machine is 32-bit, you are not able to store the string of exactly 4GB in the virtual memory of a process. So, the main question I'm interested in is: how this string is put into the main memory?

The sizes of the L1/L2 cache you can find by using cpuid's cpu-z application, or find your CPU in wikipedia, there's its L1/L2 cache specs.

I don't see anything can be done for reversing, say, 1.5GB string by a most naive way, byte by byte, since cache lines will be used at 2 "ends" of the string, and something different will be evicted from the caches.
0
 
JimBeveridgeCommented:
The L1/L2 cache area always used. It's not an "on-demand" thing.

If you want to learn more about cache line size, Google is your friend. For example, search for:
cache line size i7

I still suspect this question is a non-issue. Those 4GB have to come from somewhere, usually a disk. Compared to the disk speed, even the fastest disks, the CPU is hugely faster. So the time to read and write the data to some external media will dwarf any processing time.
0
 
JIEXACommented:
To JimBeveridge:
Citation "The L1/L2 cache area always used. It's not an "on-demand" thing."

Actually, I also thought this way, but Ulrich Drepper in his essay "What every programmer should know about memory" (http://www.akkadia.org/drepper/cpumemory.pdf) in section 6.1 described some cache bypassing techniques.
0
 
JimBeveridgeCommented:
Right. But that's an opt-out as opposed to an opt-in. There are several instructions that specifically affect cache control because you need them for synchronization across multiple processors.

As I said, if you really care about an optimum solution to this problem, you need to spend a lot of time with vTune and understand it at a visceral level. For most of us, this is a pointless question because reversing the data is a waste of time since (normally) you can just read/process it backwards instead of forwards.

for (auto i=buffer.rbegin(); i!=buffer.rend(); ++i)

I'm also not certain why you are still asking questions instead of getting your hands dirty and testing. I gave you the equation for the optimum and I gave you a way to saturate the memory bus. That's pretty much the end of the discussion. Once the memory bus is saturated, there's nothing you can do to make it go faster short of changing the hardware.
0
 
James BondSoftware ProfessionalAuthor Commented:
The solutions provided were good. Testing these solutions is time consuming and results will definitely vary from machine to machine. The question or problem or task that was asked also wasn't much experienced in the community and hence we dont have definite solution. But I got the right direction I needed.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 5
  • 3
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now