Skip to content

Petter Reinholdtsen: Legal to share more than 11,000 movies listed on IMDB?

I've continued to track down list of movies that are legal to
distribute on the Internet, and identified more than 11,000 title IDs
in The Internet Movie Database (IMDB) so far. Most of them (57%) are
feature films from USA published before 1923. I've also tracked down
more than 24,000 movies I have not yet been able to map to IMDB title
ID, so the real number could be a lot higher. According to the front
web page for Retro Film
Vault
, there are 44,000 public domain films, so I guess there are
still some left to identify.

The complete data set is available from
a
public git repository
, including the scripts used to create it.
Most of the data is collected using web scraping, for example from the
"product catalog" of companies selling copies of public domain movies,
but any source I find believable is used. I've so far had to throw
out three sources because I did not trust the public domain status of
the movies listed.

Anyway, this is the summary of the 28 collected data sources so
far:

2352 entries ( 66 unique) with and 15983 without IMDB title ID in free-movies-archive-org-search.json
2302 entries ( 120 unique) with and 0 without IMDB title ID in free-movies-archive-org-wikidata.json
195 entries ( 63 unique) with and 200 without IMDB title ID in free-movies-cinemovies.json
89 entries ( 52 unique) with and 38 without IMDB title ID in free-movies-creative-commons.json
344 entries ( 28 unique) with and 655 without IMDB title ID in free-movies-fesfilm.json
668 entries ( 209 unique) with and 1064 without IMDB title ID in free-movies-filmchest-com.json
830 entries ( 21 unique) with and 0 without IMDB title ID in free-movies-icheckmovies-archive-mochard.json
19 entries ( 19 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-gb.json
6822 entries ( 6669 unique) with and 0 without IMDB title ID in free-movies-imdb-c-expired-us.json
137 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-imdb-externlist.json
1205 entries ( 57 unique) with and 0 without IMDB title ID in free-movies-imdb-pd.json
84 entries ( 20 unique) with and 167 without IMDB title ID in free-movies-infodigi-pd.json
158 entries ( 135 unique) with and 0 without IMDB title ID in free-movies-letterboxd-looney-tunes.json
113 entries ( 4 unique) with and 0 without IMDB title ID in free-movies-letterboxd-pd.json
182 entries ( 100 unique) with and 0 without IMDB title ID in free-movies-letterboxd-silent.json
229 entries ( 87 unique) with and 1 without IMDB title ID in free-movies-manual.json
44 entries ( 2 unique) with and 64 without IMDB title ID in free-movies-openflix.json
291 entries ( 33 unique) with and 474 without IMDB title ID in free-movies-profilms-pd.json
211 entries ( 7 unique) with and 0 without IMDB title ID in free-movies-publicdomainmovies-info.json
1232 entries ( 57 unique) with and 1875 without IMDB title ID in free-movies-publicdomainmovies-net.json
46 entries ( 13 unique) with and 81 without IMDB title ID in free-movies-publicdomainreview.json
698 entries ( 64 unique) with and 118 without IMDB title ID in free-movies-publicdomaintorrents.json
1758 entries ( 882 unique) with and 3786 without IMDB title ID in free-movies-retrofilmvault.json
16 entries ( 0 unique) with and 0 without IMDB title ID in free-movies-thehillproductions.json
63 entries ( 16 unique) with and 141 without IMDB title ID in free-movies-vodo.json
11583 unique IMDB title IDs in total, 8724 only in one list, 24647 without IMDB title ID

I keep finding more data sources. I found the cinemovies source
just a few days ago, and as you can see from the summary, it extended
my list with 63 movies. Check out the mklist-* scripts in the git
repository if you are curious how the lists are created. Many of the
titles are extracted using searches on IMDB, where I look for the
title and year, and accept search results with only one movie listed
if the year matches. This allow me to automatically use many lists of
movies without IMDB title ID references at the cost of increasing the
risk of wrongly identify a IMDB title ID as public domain. So far my
random manual checks have indicated that the method is solid, but I
really wish all lists of public domain movies would include unique
movie identifier like the IMDB title ID. It would make the job of
counting movies in the public domain a lot easier.

As usual, if you use Bitcoin and want to show your support of my
activities, please send Bitcoin donations to my address
15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

0
sfy39587p00