2

I have a web application where users upload a text file. The application reads the text file and based on the file data, it performs 30 different tasks and shows output to the user after few seconds.

My approach is to write a php program which would take the text file and then call different scripts (php and unix scripts) to perform 30 different tasks.

I have two queries :

  1. Can I run 30 different scripts in parallel to save overall execution time? If yes then should I use pthread library in php for multi threading? I have read in an article that multi threading does not help in a web application to save execution time. Article says that although all threads will run in parallel but it will take same amount of time as if they run in sequence.

  2. How would I know whether all the threads are over? If I keep on checking for all the threads execution, won't it be overhead to the system resource?

Doug
  • 3,312
  • 1
  • 24
  • 31
Arti
  • 35
  • 1
  • 2

2 Answers2

0

Using several threads can potentially increase the speed of your overall execution time, but it can also potentially make what you're trying to achieve much more complicated, so it should generally be avoided unless the speed improvements are really worth it to you.

In addition, if this is being triggered (as I suspect) by someone loading a webpage, then it is almost certainly a bad plan to use threading. This is discussed in better detail here. If it's being triggered from the command line such as:

php processTextFile.php

Then it could potentially be something that may benefit you.

As to whether in this particular instance it will speed up your processing, that will depend on what exactly you're doing.

From how you word your problem, you could be trying to do 2 things:

  1. Running a pipeline of tasks on a text file. In this context, the data that you receive after running the first task on it would be different from the initial data, then you would be running this new data through the second task. (For example, perhaps you have a JSON encoded object and the first task is decoding it into an array). If you're doing this, then multithreading will not help you as each thread needs the data it will need to use from when you initialise the thread.

  2. Doing your separate tasks all on the original file and then returning the processed data to be return your results. In this instance, multithreading would work.

If multithreading is still valid for your use case, then:

  • You can run the 30 scripts in parallel. You would tend to see a performance increase only on boxes where you have either multiple processors or multiple cores as it would allow the computer to use both at the same time
  • You can use pthread to achieve this (95% sure, this thread supports that view)
  • To work out how to actually use it you might want to look at the examples on the GitHub project of the implementation
Community
  • 1
  • 1
Doug
  • 3,312
  • 1
  • 24
  • 31
0

Some people, when confronted with a problem, think, "I know, I'll use threads," and then two they hav erpoblesms.

Instead of attempting to parallelize those 30 scripts, I instead would recommend to focus on improving the performance. Of course it depends on what specific tasks you are attempting to perform and how large are those uploaded files.

One of choke-point usually is the drive's I/O. So, if you can find a reasonable way to use cache (for php scripts) of ramFS (for both php and shell scripts), it might improve the speed of performing those tasks.

That said, without actually knowing the specifics, it is really hard to give any non-vague advice :(

tereško
  • 58,060
  • 25
  • 98
  • 150