At a client we have a huge directory of files. I wanted to list the first few files. ls -l | head took ages as it first lists all the files and only then cuts it down. After my first attempts in Python failed I wrote a Perl one-liner to list the first elements of a huge directory. However I wanted to see if I can do it with Python in some other way.

using iterdir of pathlib

The original attempt in Python was using the iterdir method of pathlib.

examples/python/list_dir_using_iterdir.py

import pathlib

path = pathlib.Path("/home/gabor/work/code-maven.com/sites/en/pages/")
count = 0

for thing in path.iterdir():
    count += 1
    print(thing)
    if count > 3:
        break

On the real data it took 47 minutes to run.

using walk of os

The second attempt was to use the walk method of os.

examples/python/list_dir_using_walk.py

import os

path = "/home/gabor/work/code-maven.com/sites/en/pages/"
count = 0

for dirname, dirs, files in os.walk(path):
    for filename in files:
        print(os.path.join(dirname, filename))
        count += 1
        if count > 3:
            exit()

I don't know how long this would take. I stopped it after a minute.

using scandir of os

Finally I found the scandir method of os. That did the trick:

examples/python/list_dir_using_scandir.py

import os

path = "/home/gabor/work/code-maven.com/sites/en/pages/"
count = 0

with os.scandir(path) as it:
    for entry in it:
        print(entry.name)
        count += 1
        if count > 3:
            exit()

using scandir and a range

After getting an improvement suggestion for my solution in Perl I thought I can use the same idea here too. I assume that there are at least 3 element in this folder or I'll get a StopIteration exception calling __next__, but besides that this works.

examples/python/list_dir_using_scandir_range.py

import os

path = "/home/gabor/work/code-maven.com/sites/en/pages/"

with os.scandir(path) as it:
    for _ in range(3):
        print(it.__next__().name)