Free molecular flow in a cylindrical pipe – with multithreading

Posted on March 15th, 2016
Previous Article :: Next Article

view3You may have noticed many recent blog posts dealing with “contamination modeling”. While I started my scientific computing career with plasma simulations, these days much of my focus (and work projects) is spent on modeling contaminant transport. It’s like PIC, but without all the complicated plasma physics! There are two types of contaminants that are of interest: molecular and particulate. When it comes to molecular transport, designer of some sensitive sensor may want to know how much deposition to expect when placed in a vacuum chamber. Coming up with the answer requires computing gray body view factors between the target and all sources of contaminants. There are two ways of going about this: solving a system of linear equations, or performing Monte Carlo ray tracing. The ray tracing approach shares many similarities with PIC: we use sources to inject simulation particles, integrate their positions, perform surface interactions, and compute surface flux to objects of interest.

While this sounds simple, the devil is in the details. In this case, the detail is the surface interaction. The sensitive component will often be located outside a direct line of sight of sources, and it may take a molecule many bounces before it randomly finds its way to the sensor. Hence it is crucial that your code models these bounces correctly. Molecules reflecting from a rough surface bounce off such that flux follows cosine distribution about the surface normal. It’s quite easy to implement this diffuse reflection incorrectly! A common test is to consider conductance, or specifically transmission probability, of molecules through a cylindrical pipe. This problem was first addressed by Clausing (in 1932!). Clausing [1] tabulated his closed form formula for several simple geometries, including a pipe of varying L/D (length over diameter) ratios. Other researchers later came up with their own values for the “Clausing factor”, including DeMarcus and Berman. This latter data is tabulated in O’Hanlon [2]. The two solutions are very close but diverge slightly once L/D exceeds 5, as can be seen in Figure 1.

free molecular transmission factor in a cylindrical pipe
Figure 1. Transmission factor for various L/D ratios. The dots are from the simulation code in this article.

Simulation Code

In this article we’ll write a code for computing these transmission factors for free molecular flow in a cylindrical pipe. The pipe is defined analytically (as opposed to using a surface mesh), allowing us to consider arbitrary L/D ratios. The diameter D is set to 1, hence we simply vary the pipe length L. The pipe consists of three surfaces: inlet, wall, and outlet. The simulation proceeds as follows:

  1. Particles are continuously injected at the inlet at random position and velocity following the cosine law
  2. If a particle hits the wall, it is reflected from the point of impact with new velocity also sampled from the cosine law
  3. If a particle crosses the inlet or the outlet, the particle is removed and the appropriate counter inlet or outlet is incremented
  4. Simulation runs for a total of million time steps. Transmission factor is computed as outlet/(inlet+outlet)

The main loop is shown below:

    /*main loop, runs for num_ts time steps*/
    void mainLoop()
        for (ts=0;ts<num_ts;ts++)
        finished = true;

Below is the code for injecting particles. It uses lambertianVector function to sample a random unit vector about normal with the two tangents given by tang1 and tang2.

    /*injects np_per_ts particles at z=0 with cosine distribution about z axis*/
    void InjectParticles()
        double source_n[3] = {0,0,1};     //source normal direction
        double tang1[3] = {1,0,0};        //tangent 1
        double tang2[3] = {0,1,0};        //tangent 2
        for (long p=0;p<np_per_ts;p++)
            double pos[3];
            double vel[3];
            double rr;
            //select random position between [-R:R]x[-R:R], accept if in tube
            do {
                rr= pos[0]*pos[0]+pos[1]*pos[1];
                if (rr<=R*R) break;
            } while(true);
            pos[2]=0;    //z=0
            double n[3];        
            //multiply by drift velocity
            vel[0]= n[0]*vdrift;
            vel[1]= n[1]*vdrift;
            vel[2]= n[2]*vdrift;
            //add to dynamic storage
            particles.push_back(new Particle(pos,vel));

The particle mover is shown below. Particles are stored in a C++ linked list data structure. This data structure makes it easy to remove entries (killed particles) but is not as efficient to traverse as the consecutive array. We iterate through the particles using an iterator. Erasing a particle returns a new iterator and invalidates the existing one, hence the need for the while loop. The iterator is incremented for alive particles with it++. The code structure should be familiar to anyone having taken my PIC Fundamentals course. We check for wall impact by computing \(r=\sqrt{x^2+y^2}\). If \(r>R\), the cylinder radius, the particle hit a wall. We also check for particles leaving through inlet or the outlet by making sure that \(z\in(0,L)\). We check for wall impact first, otherwise the particle in the sketch above may be counted as leaving the tube on the final bounce. If a particle hit the wall, the parametric position of the impact is computed from t=(R-r0)/(r-r0) where r0 is the radial distance of the particle at the start of the push. The particle is then pushed for the remaining dt_rem.

    /*moves particles, reflecting ones hitting the tube wall*/
    void MoveParticles()
       double x_old[3];
       list<Particle *>::iterator it = particles.begin();
       double sum=0;
       while (it!=particles.end())
           Particle *part = *it;
           double dt_rem = dt;
           //while there is delta t left in particle push
               //save old position
               //new position, x = x + v*dt
               //did particle hit the wall?
               double rr = part->pos[0]*part->pos[0]+part->pos[1]*part->pos[1];
               if (rr>=R*R)
                   //compute intersection point
                   double r = sqrt(rr);
                   double r0 = sqrt(x_old[0]*x_old[0]+x_old[1]*x_old[1]);
                   double t = (R-r0)/(r-r0);
                   //push particle to the wall
                   //update remaining delta_t
                   dt_rem -= t*dt_rem;
                   //surface normal vector
                   double normal[3];
                   normal[0] = 0-part->pos[0];
                   normal[1] = 0-part->pos[1];
                   normal[2] = 0;
                   //get tangents
                   double tang1[3], tang2[3];
                   //cosine emission
                   double dir[3];
                   lambertianVector(dir, normal, tang1, tang2);
                   for (int i=0;i<3;i++)
               } //dt_rem
               else //no wall impact
                    //particle left through inlet or outlet?
                   if (part->pos[2]<=0 || part->pos[2]>=L)
                       if (part->pos[2]<0)
                       //kill particle
                       delete part;
                       it = particles.erase(it);
                       break;        //break out of dt_rem loop
                   else    //particle inside the cylinder
                       dt_rem = 0;
                       it++;    //go to next particle
           } //while dt_rem           
       }    //particle loop

Below is the “meat” of the simulation, the code that samples a random vector according to the cosine law.

    /*returns direction from cosine distribution about normal given two tangents*/
    void lambertianVector(double dir[3], double normal[3], double tang1[3], double tang2[3])
        //sample angle from cosine distribution
        double sin_theta = sqrt(rnd());     //sin_theta = [0,1)
        double cos_theta = sqrt(1-sin_theta*sin_theta);
        //random rotation about surface normal
        double phi = 2*PI*rnd();
        double vn[3],vt1[3],vt2[3];
        //add components
        for (int i=0;i<3;i++)
            dir[i] = vn[i]+vt1[i]+vt2[i];

The lambertianVector function requires besides the normal also two tangents. For a cylinder, there is probably a closed form for the tangents. However, the below code also works. It creates two randomly oriented tangent vectors from an arbitrary normal. It does this by first finding the spatial direction in which the normal has the largest component. It then creates a test vector in the “following” direction. As I am not using the right syntax, I am sure this is quite confusing. If the normal is mainly in the y direction, the test vector will be in the z direction. The two vectors are then crossed, giving us the first tangent, since the result of a cross-product is perpendicular to the two arguments. The second tangent is computed by crossing the first tangent with the normal.

    /*generates two tangent vectors, assumes normalized input*/
    void tangentsFromNormal(double n[3], double tang1[3], double tang2[3])
        //get maximum direction of the normal
        int max_i=0;
        double max = abs(n[0]);
        for (int i=1;i<3;i++) if (abs(n[i])>max) {max=abs(n[i]);max_i=i;}
        //create a test vector that is not parallel with norm
        double test[3] = {0,0,0};
        if (max_i<2) test[max_i+1]=1.0; else test[0]=1.0;
        //cross the two vectors, this will give the first tangent -perpendicular to both
        vec::cross(tang1,n,test);    //assuming n is already normalized
        //and the second vector//


In order to get reasonable results we need to simulate a large number of test particles. Since we are modeling free molecular flow, the particles don’t influence each other in any way. As such, we could trace each particle from start to finish before moving on to the next one, or push them all concurrently through small time steps as is done here. This also means that this code is embarrassingly parallel. In the previous post I showed you how to use graphics cards (GPUs) to parallelize plasma codes. In this post, I show you how to use multithreading. All modern CPUs contain multiple cores. For instance, my fairly mid-range laptop from 2014 features the Intel i7-4700HQ CPU. This CPU contains 4 cores and supports 8 concurrent executing threads. This means that instead of pushing just one set of particles at once, we can push more of them, theoretically up to eight. Multithreading is one of the main three ways of achieving parallelization that will be covered in the Distributed Computing course. It’s a shared memory approach, where bunch of simulations are running on the same system and thus have access to the same memory. C++11 natively supports multithreading with #include <thread>. This defines a std::thread object. Launching a new thread is quite trivial; we simply use syntax like std::thread(function, params), which will launch function(params) as a new thread. Such an approach works fine for simple functions. However, here we need to encapsulate data needed by each thread. In order to avoid data contention, we let each thread have its own set of particles. We also need each thread to have its own random number generator, since only one thread can access a generator at a time. Our data in encapsulated in a class called ConductanceXYZ. The listing is shown below. The mainLoop also belongs to this class and was discussed previously. Note how the constructor generates a new thread. The thread calls a static function start which is passed a ConductanceXYZ class pointer. The function then simply launches mainLoop in that class.

/*conductance class*/
class ConductanceXYZ 
    const double R=0.5;      //cylinder radius
    const double L;          //cylinder length
    const long np_per_ts;    //number of particles to create per time step
    const int num_ts;        //number of timesteps to run
    const double vdrift = 5;    //injection velocity
    list<Particle*> particles;  //list of particles
    const double dt=1e-2;    //simulation time step size
    int ts;        //current time step
    int inlet = 0;    //number of particles crossing inlet
    int outlet = 0;   //number of particles crossing outlet
    bool finished = false;    //set true once loop finishes
    /*constructor, initializes parameters and seeds random number generator*/
    ConductanceXYZ(double L, long np_per_ts, int num_ts):
        gen((unsigned int)chrono::system_clock::now().time_since_epoch().count()),
        L(L), np_per_ts(np_per_ts), num_ts(num_ts)    {
        thr = new thread(start, this);
    thread *thr;
    /*thread start*/
    static void start(ConductanceXYZ *p)   {p->mainLoop();}
    /*random number generator*/
    mt19937 gen;        
    double rnd() {
        uniform_real_distribution<double> rnd_dist(0.0,1.0);
        return rnd_dist(gen);
//	...

Below is the main function where everything starts. We first query the system for the number of supported concurrent threads. This becomes the default value, although it will severely slow down your computer as it prevents the operating system from running its own threads. We then use optional command line arguments to set the cylinder length and also the total number of simulation particles. The particle count per thread is then computed. Finally we compute the number of particles per thread per time step, np_per_ts. This value is floored at 1, meaning that the code will always generate at least 1 million (num_ts) particles per thread. We then create num_threads ConductanceXYZ classes and store these in a linked list. At this point, the main function doesn’t know that there are these other simulations running. If we don’t do anything else, the code would shortly terminate. Hence we need to add a do-while loop that keeps main alive while the threads are still running. The way we check is by first setting bool finished=true. Each thread has its own finished boolean which will be false while the time step loop is running. As we iterate through the threads, we perform a boolean “and”, finished &= sim.finished. Any non-true value will clear the boolean. We also reduce the inlet and outlet counters into a single global value. We use this to compute the transmission factor. As you can see in Figure 1, the results are quite good (there is a <1% error using 1e7 particles). But what about parallel performance? This is plotted below in Figure 2. Two cases are plotted: 1e8 and 2e8 particles. Times for the 2e8 case were divided by two to allow us to compare the trends. I thought that perhaps the efficiency will improve with a larger number of particles on more threads, but that was not the case. We can see that there is a huge improvement from 2 to 4 threads. However going past 4 threads doesn’t really bring much more speedup. With 8 threads, the parallel efficiency is only about 50%, meaning the code is running the same speed as a 100% efficient 4 threaded code (our code is however only about 74% efficient with 4 threads). There are probably multiple reasons for this. My system has only 4 cores, so the other 4 threads are probably some non-physical virtual threads. Also, I was running this on Windows with many applications opened, meaning there were many other processes competing for CPU time.

#include <iostream>
#include <iomanip>
#include <thread>
#include <random>
#include <list>
using namespace std;
/*********   MAIN *********/
int main(int nargs, char*args[])
  int num_threads = thread::hardware_concurrency();
  cout<<"Number of supported concurrent threads: "<<num_threads<<endl;
  double L=1;
  double part_totf = 1e7;	//double so we can use scientific notation
  //get parameters from command line if specified
  if (nargs>1) L=atof(args[1]);
  if (nargs>2) num_threads=atoi(args[2]);
  if (nargs>3) part_totf=atof(args[3]);
  //show active parameters
  cout<<"L = "<<L<<"\tnum_threads = "<<num_threads<<"\tpart_tot = "<<part_totf/1e6<<"mil"<<endl;
  long part_tot = (long)part_totf;  
  int num_ts = 1000000;	//number of time steps
  long part_per_thread = (long)(part_tot/((double) num_threads)+0.5);
  long np_per_ts = (long)(part_per_thread/(double)num_ts+0.5);
  if (np_per_ts<1) np_per_ts=1;	//make sure we get at least 1
  clock_t start = clock();    //start timer
  list<ConductanceXYZ> sims;    //storage for our simulations
  //create num_threads simulations
  for (int i=0;i<num_threads;i++)
      sims.emplace_back(L,np_per_ts, num_ts);
  //wait for threads to finish
  bool finished;
       long tot_gen = 0;
       long tot_out = 0;
       long tot_np = 0;
       int ts = num_ts+1;
       finished = true;
       //combine counts across threads and also check for completion
       for (ConductanceXYZ &sim:sims) {
            tot_gen += sim.inlet + sim.outlet;
            tot_out += sim.outlet;
            tot_np += sim.particles.size();
            if (sim.ts<ts) ts=sim.ts;
            finished &= sim.finished;    //boolean and, any false will clear it
       double K=0;
       if (tot_gen>0) K=tot_out/(double)(tot_gen); 
         cout<<"it="<<ts<<"\t np="<<setprecision(4)<<tot_np<<"\t K="<<K<<endl;
       //sleep for 50 milliseconds
    } while (!finished);
  clock_t end = clock();
  cout<<"Simulation took "<<(end-start)/((double)CLOCKS_PER_SEC)<<" seconds"<<endl;
  return 0;
Figure 2. Parallel efficiency

Source Code

You can download the code here: conductance.cpp. If using gcc, you need to compile with std=c++ flag. You can specify optional command line parameters L num_threads num_particles, for instance conductance.exe 0.4 4 1e7.

One note, if you are using Windows, do not run this under Cygwin. Cygwin still does not appear to support multithreading (if you know otherwise, let me know!). When running the code using GCC under Cygwin, I saw no improvement with multiple threads.


[1] Clausing, P, “The Flow of Highly Rarefied Gases through Tubes of Arbitrary Length”, Ann. Physik, (5) 12, 961, 1932
[2] O’Hanlon, J. F. A User’s Guide to Vacuum Technology, Wiley-Interscience, 2003

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile=""> In addition, you can use \( ...\) to include equations.