Listing first elements of a huge directory using Python
At a client we have a huge directory of files. I wanted to list the first few files. ls -l | head took ages as it first lists all the files and only then cuts it down. After my first attempts in Python failed I wrote a Perl one-liner to list the first elements of a huge directory. However I wanted to see if I can do it with Python in some other way.
using iterdir of pathlib
The original attempt in Python was using the iterdir method of pathlib.
examples/python/list_dir_using_iterdir.py
import pathlib path = pathlib.Path("/home/gabor/work/code-maven.com/sites/en/pages/") count = 0 for thing in path.iterdir(): count += 1 print(thing) if count > 3: break
On the real data it took 47 minutes to run.
using walk of os
The second attempt was to use the walk method of os.
examples/python/list_dir_using_walk.py
import os path = "/home/gabor/work/code-maven.com/sites/en/pages/" count = 0 for dirname, dirs, files in os.walk(path): for filename in files: print(os.path.join(dirname, filename)) count += 1 if count > 3: exit()
I don't know how long this would take. I stopped it after a minute.
using scandir of os
Finally I found the scandir method of os. That did the trick:
examples/python/list_dir_using_scandir.py
import os path = "/home/gabor/work/code-maven.com/sites/en/pages/" count = 0 with os.scandir(path) as it: for entry in it: print(entry.name) count += 1 if count > 3: exit()
using scandir and a range
After getting an improvement suggestion for my solution in Perl I thought I can use the same idea here too. I assume that there are at least 3 element in this folder or I'll get a StopIteration exception calling __next__, but besides that this works.
examples/python/list_dir_using_scandir_range.py
import os path = "/home/gabor/work/code-maven.com/sites/en/pages/" with os.scandir(path) as it: for _ in range(3): print(it.__next__().name)
Published on 2023-01-17