Category Archives: Image Archive

Public Domain Day Celebrates Creative Works from 1928

Hundreds of people from all over the world gathered together on January 25 to honor the thousands of movies, plays, books, poems and songs that recently entered the U.S. public domain.

Steamboat Willie, Walt Disney’s 1928 animated film featuring Mickey Mouse, had top billing at the virtual event. Literature now free from restriction for reuse includes Orlando by Virginia Woolf and Tarzan Lord of the Jungle by Edgar R. Burroughs. Sound recordings from 1923 (released on a different schedule) joined the public domain such as ”Down Hearted Blues” by Bessie Smith and ”Who’s Sorry Now” by Isham Jones Orchestra.

WATCH RECORDING:

“There’s so much to rediscover and to celebrate,” said Jennifer Jenkins, director of the Center for the Study of the Public Domain at Duke Law School. For example, the release of The Great Gatsby into the public domain in 2021 inspired a creative flurry — new versions of the novel from the perspective of different characters, a prequel telling the backstory of Nick Caraway, a young adult remix, and song. “From the serious to the creative, to the whimsical to the wacky, these are all the great things we can do…now that [these works] are in the public domain and free to copy, to share, to digitize and to build upon without permission or fee.”

For an overview of new works in the public domain, view the curated list from the Center for the Public Domain.

Remix Contest

The winning film from the Public Domain Day 2024 Remix Contest was shown as well: “Sick on New Year’s,” by Ty Cummings. Every year since 2021, this contest has invited artists to remix works from its collection to showcase new and creative uses of public domain materials. Fifty films were submitted to this year’s competition, according to Amir Esfahania, artist in residence at the Archive. Learn more about the finalists or watch all the submissions in our recent blog post.

Advocacy

“Celebrating the public domain is not just about vintage references and period-appropriate clothing. It’s about understanding history to inform the present day,” said Lila Bailey, Internet Archive senior policy counsel and co-host of the virtual festivities. “We think there should be time set aside every year to celebrate the immense riches that free and open culture provides to everyone.”

While federal holiday recognition (like MLK Day or Presidents’ Day) for the public domain is unlikely, there was a discussion of an advocacy campaign for establishment of a commemorative Public Domain Day (more along the lines of National Data Privacy  Day or National Whistleblowers Day).

“It only requires a simple resolution in the Senate with high chances of recognition,” said Amanda Levendowski, director of Georgetown Law School’s Intellectual Property and Information Policy Clinic. “Prospects for passage are way better than possible. About 80 percent of proposals are passed — and maybe next year, Public Domain Day will be among them.”

Experts said a successful drive for the designation will require a collaborative effort. A kickoff event will be held February 29 in New York City, hosted by Library Futures, executive director Jennie-Rose Halperin announced.

AI and the Public Domain

The online program also featured a panel discussion on generative artificial intelligence, copyright and artist expression. Experts weighed in on just what should be the copyright status of the outputs of generative AI.

Panelists (clockwise from top left): Lila Bailey (Internet Archive), Heather Timm (artist), Maxximillian (artist), Matthew Sag (Emory Law), and Juliana Castro Varón (Cita Press).

Now, AI tools can turn text or simple descriptions into images that are  genuinely new and often look like exactly the kind of things that people get copyrighted if a human made them, explained Matthew Sag, professor of law, artificial intelligence, machine learning, and data science at Emory University.

“The copyright office is quite clear that to get copyright, you have to have human authorship. So something created entirely by an unsupervised machine is not eligible for copyright,” Sag said, noting that the courts have recently agreed. “The interesting question is what about when humans are using AI as a tool and directing the output. This is where the controversy really is.”

On the panel, two artists, Heather Timm and Maxximillian, shared how they both leverage AI in the creative process.

Timm said she started using generative AI in 2021 and thinks the copyright office should cover works that have results from it. She has trained AI models on her own physical work and then created something new collaborating with the machine, as well as conceptualized how to blend different pieces of work in a collage or sculpture.  

“I use it almost as a notebook,” Timm said. “If I have a concept or an idea about something on the go, I can immediately prompt that and have it as a placeholder to explore it later.”

As a filmmaker and musician, Maxximillian said she feels passionate about AI and it has saved her time creating animated characters and helping refine her text. “As a professional artist, I rely on copyright to keep viable the works that I produce for clients legally,” said Maxximillian. “It’s important to understand that copyright protection enables the creator to be a steward of that work. The question to consider: Who benefits by denying copyright on AI? I think nobody benefits.”

An open access publisher, Juliana Castro Varón, design director and founder of Cita Press, also addressed the issue. “I believe that AI may pose economic, power, and labor challenges, but I feel very confident that creativity will survive technology,” she said. All books Cita produces are in the public domain for everyone to download. “We are not at all against people using AI for their work, but we continue to hire humans…elevating the work of people is core to our mission.”

***

The event was co-hosted by Internet Archive and Library Futures with support from Creative Commons, Authors Alliance, Public Knowledge, SPARC and Duke Law’s Center for the Study of the Public Domain.

Press conference statement: Lawrence Lessig, Harvard Law

Lawrence Lessig is a professor of law at Harvard Law. Lawrence spoke at the press conference hosted by Internet Archive ahead of oral argument in Hachette v. Internet Archive.

Statement

Also available at Lessig for the Internet Archive on Medium.

We should all recognize — and celebrate—the importance of commercial publishing, for authors and creators everywhere. Commercial publishing creates the income that authors depend upon to have the freedom to create great new works. Without commercial publishing, much of the greatest that will be won’t be written.

But we must also recognize that culture needs more than commercial publishing. If the business model of commercial publishing controlled our access to our past, then much of who we were, and much of how we learned to be better, would simply disappear.

Think about the extraordinary platform that is Netflix. For a small price each month, subscribers have access to thousands of movies and television shows, far more than at any point in human history. The revenue subscribers provide in turn lets Netflix invest in new creative work. No one who knew television from the 1970s could believe that the quality of television has not improved dramatically over the last 50 years.

Yet Netflix’ archive is not endless. And each year, the site culls titles from its collection and removes them from its library. Netflix does this for many reasons — for example, the content could be licensed from third parties, and the term of that license has expired, or the site may see that demand for the title is meager, so bearing the costs of carrying it no longer makes sense. Regardless of the reason, the decision is an economic one for Netflix — Netflix makes available only those titles that it continues to make economic sense to make available. Such is the business model of a commercial publisher.

But culture needs a different business model. We need access to our past, not just the part of our past that continues to be commercially viable. We need libraries that assure we can see everything our parents or grandparents saw, so we can understand why they were as they were, and how they got better. Great libraries preserve access to as much as they physically can — not based on which titles continue to earn revenue. The past is just one more competitor for a commercial publisher; but for a library, the past is a gift that is to be nurtured and protected, regardless of its commercial value.

We are at a critical moment in the history of culture. The lawsuit that the Internet Archive faces will determine whether the business model of culture is the commercial model alone, or whether there will continue to be a place for libraries, and therefore, continue to be a practice of assuring as much access to our past as is possible. The particular fight in this lawsuit is important; the general fight is critical: Is the past that we have access to just the past that continues to pay? Or is the past we can have reliable access to the past that libraries strive to make available — not for profit, but for the love of culture and for the truth that that access to all of culture continues to assure.

Meet the Librarians: Alexis Rossi, Media & Access

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


Alexis Rossi has always loved books and connecting others with information. After receiving her undergraduate degree in English and creative writing, she became a book editor and then worked in online news. 

Alexis Rossi

In 2006, Rossi joined the staff of the Internet Archive. She was working on the launch of the Open Library project when she recognized the need to learn more about how to best organize materials. She enrolled at San Jose State University and earned her Master’s of Library and Information Science in 2010.

“It gave me a better grasp of how to hierarchically organize information in a way that is sensible and useful to other libraries,” Rossi said. “It also gave me better familiarity with how other more traditional libraries actually work—the types of data and systems they use.”

Rossi concentrated on web interfaces for library information, understanding digital metadata, and how to operate as a digital librarian. In addition to overseeing the Open Library project, at the Internet Archive, Rossi managed a revamp of the organization’s website, ran the Wayback Machine for four years, founded the webwide crawling program, and is currently a librarian and director of media & access.

“One of the themes of my life is trying to empower people to do whatever they want to do,” said Rossi, who grew up in Monterey, California, and now lives in San Francisco. “Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

“Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

Alexis Rossi, Media & Access

Rossi acknowledges she is privileged to have means to avail herself to an abundance of information, while many in other parts of the world do not. There are so many societal problems she cannot solve, Rossi said, but she believes her work is making a contribution.  

“We can build a library that allows people to access information for free, wherever they are, and however they can get to it, in whatever way. That, to me, is incredibly important,” Rossi said. It’s also rewarding to help patrons discover new information and recover materials they may have thought were lost, she added.

When she’s not working, Rossi enjoys making funky jewelry and elaborate cakes (a skill she learned on YouTube).

Among the millions of items and collections in the Internet Archive, what is Rossi’s favorite? Video and audio recordings of her dad, now 73, playing the piano, organ and accordion: “It’s just so good. It’s such a perfect little piece of history.”

Boston Phoenix Rises Again With New Online Access

For more than 40 years, The Boston Phoenix was the city’s largest alternative weekly in covering local politics, arts, and culture.

The Boston Phoenix, Volume 2, Issue 44 – October 30, 1973

“It was really a pretty legendary paper. The style of the writing and the quality of writers were nationally known,” said Carly Carioli, who started at the newspaper as an intern in 1993 and became its last editor-in-chief.

With the advent of online advertising, it struggled like many independent newspapers to compete. In 2013, the Phoenix folded.

After the publication shut down, owner Stephen Mindich wanted the public to be able to access back issues of the Phoenix. The complete run of the newspaper from 1973 to 2013 was donated to Northeastern University’s special collections. The family signed copyright over the university. 

Librarians led a crowdsourcing project to create a digital index of all the articles and authors, which was helpful for historians and others in their research, said Giordana Mecagni, head of special collections and university archivist. Northeastern had inquired about digitizing the collection, but it was cost prohibitive. 

As it turns out, the Internet Archive owned the master microfilm for the Phoenix and it put the full collection online in a separate collection: The Boston Phoenix 1973-2013. Initially, the back issues were only available for one patron to check out at a time through Controlled Digital Lending. Once Northeastern learned about the digitized collection, it extended rights to the Archive to allow the Phoenix to be downloaded without controls.

Read The Boston Phoenix at the Internet Archive

“All of a sudden it was free to the public. It was wonderful,” Mecagni said. “We get tons and tons of research requests for various  aspects of the Phoenix, so having it available online for free for people to download is a huge help for us.” 

Inquiries range from someone trying to track down a classified ad through which they met their spouse, or an individual looking up an article about a band. The paper was a leader in writing groundbreaking stories about the LGBTQ community, the AIDS crisis, race and the Vietnam War—often issues not covered in the mainstream press. “Making that coverage public is adding an immense amount to the historical record that would not be there otherwise,” said Carioli. He said he appreciates the preservation and easy access to back issues, as do other journalists, researchers and academics.

“It’s a dream come true,” said Carioli of the Internet Archive’s digitization of the newspaper. “The Phoenix was invaluable in its own time, and I think it will be invaluable for a new generation who are just discovering it now. It was a labor of love then and the fact that it’s online now is huge for Boston, but also for anyone who’s interested in independent media and culture.”

Pretend you’re here with Internet Archive Zoom backgrounds

Have you seen these gorgeous library backgrounds you can use to pretend you’re amongst the smell of of old books and hushed page turning?

When I saw them I got a little jealous and thought, “computers are just as soothing!” So without further ado, welcome to your Internet Archive virtual Zoom backgrounds.

We’ve got a pretty majestic building you could sit in front of. There’s free wifi.

Or you can come inside and sit in the Great Room with us, stained glass dome and all.

Sit quietly amongst the pews with our little Internet Archivist sculptures by Nuala Creed.

Or have them be your backup dancers / Greek chorus on all your calls.

You can sit amongst the films waiting to be digitized.

Or pretend to be digitizing them yourself.

Scan books seated in front of a Table Top Scribe.

Or sit with the constant hum of busy servers in the background.

Images of Afghanistan 1987-1994

Afghan Media Resource Center’s correspondent interviewing a Muj Commander, 1991

Journalists and others risk their lives to keep the public informed in times of conflict. War imagery provides us with important information in the moment, and creates a trove of invaluable archival content for the future.

Please be aware that this collection contains some disturbing photos of violence and its aftermath (though we have not included any in this blog post).

The Afghan Media Resource Center (AMRC) was founded in Peshawar, Pakistan, in 1987, by a team of media trainers working under contract to Boston University. The goal of the project was to assist Afghans to produce and distribute accurate and reliable accounts of the Afghan war to news agencies and television networks throughout the world.  Beginning in the early 1980’s amidst a news blackout imposed by the Soviet backed Kabul government, foreign journalists had become targets to be captured or killed. The AMRC was an effort to overcome the substantial obstacles encountered by media representatives in bringing events surrounding the Afghan-Soviet war to world attention.

An armed Muj posing for the camera, 1988

Beginning in 1987, a series of six week training sessions were conducted at the AMRC original home in University Town, Peshawar, Pakistan.  Qualified Afghans were recruited from all major political parties, all major ethnic groups and all regions of Afghanistan, to receive professional training in print journalism, photo journalism and video news production.  Haji Sayed Daud, a former television producer and journalist at Kabul TV before the Soviet invasion, was named AMRC Director.

After the completion of their training, 3-person teams were dispatched on specific stories throughout Afghanistan’s 27 provinces, with 35mm cameras, video cameras, notebooks, and audio tape recorders. Photo materials were distributed internationally through SYGMA and Agence France Press (AFP). Video material was syndicated and broadcast by VisNews (now Reuters), with 150 broadcasters in 87 countries, Euronews and London-based WTN (now Associated Press), Thames Television, ITN, Swedish, French, Pakistani and other regional networks.

A young girl carrying clean drinking water, 1989

In 2000 AMRC began publishing a popular and influential newspaper in Kabul: ERADA (Intention). With one interruption, ERADA publication continued until 2012.

Beyond the AMRC archive, the AMRC conducted dozens of training programs and workshops for writers and radio journalists, including training programs for Refugee Women in Development (REFWID). The AMRC also established radio and TV studios in the provincial capitaol, Jalalabad, and produced radio and TV programs, including educational radio dramas, for a variety of international organizations. AMRC also conducted public opinion polls in Afghanistan, including an extensive Media Use Survey in Afghanistan, financed by InterMedia, a Washington, D.C. group.

Armed Muj pulling out an unexploded missile, 1989

The AMRC collection spans a critical period in Afghanistan’s history – (1987 – 1994), including 76,000 photographs, 1,175 hours of video material, 356 hours of audio material, and many stories from print media.

An Afghan weaving carpet, 1990

In 2012 AMRC received a grant to digitize the entire AMRC archive, to preserve the collection at the U.S. Library of Congress. AMRC senior media advisors Stephen Olsson and Nick Mills were trained in the digitization processes by the Library of Congress, then spent two weeks in Kabul training the AMRC staff. The digitization and metadata sheets (in English, Dari and Pashto) were completed in 2016, and were welcomed into the Library of Congress with a formal ceremony.  We are now making the entire AMRC collection available through our on-line partner, The Internet Archive.

Now the entire collection is readily available to scholars, researchers and publishers.  All royalties for commercial use of the photo images and video material will continue to support the non-profit work of the AMRC.

30 Days of Stuff

Jason Scott, free-range archivist, reporting in as 2017 draws to a close.

As part of our end-of-year fundraising drive, I thought it might be fun to tweet highlighted parts of the vast stacks of content that the Internet Archive makes available for free to millions. A lot of folks know about our Wayback Machine and its 20+ years of website history, but there’s petabytes of media and works available to see throughout the site. I called it “30 Days of Stuff”, and for the last 30 days I’ve been pointing out great items at the Archive, once a day.

You won’t have to swim upstream through my tweets; here on the last day, I’ve compiled the highlighted items in this entry. Enjoy these jewels in the Archive’s collection, a small sample of the wide range of items we provide.

Books and Texts

  • The Latch Key of my Bookhouse was one of the first books scanned by the Internet Archive in its book scanner tests, and it’s a 1921 directory of Children’s Literature that is filled with really nice illustrations that came out great.
  • As part of our ever-growing set of Defense Technical Information Center collection, we have The Role of the Citizens Band Radio Service and Travelers Information Stations In Civil Preparedness Emergencies Final Report, a 1978 overview of CB Radio and what role it might play in civil emergencies. Many thousands of taxpayer-funded educational and defense items are mirrored in this collection.
  • Also in the DTIC collection is The Battalion Commander’s Handbook 1980, which besides the crazy front page of stamps, approvals and sign-offs, is basically a manager’s handbook written from the point of view of the US Army.
  • There are hundreds of tractor manuals at the Archive. Hundreds! Of all types, languages (a lot of them Russian) and level of information. Tractors are one of those tools that can last generations and keeping the maintenance on them in the field can make a huge difference in livelihood.
  • A lovely 1904 catalog for plums called The Maynard Plum Catalogue was scanned in with one of our partner organizations and it’s a breathless and inspiring declaration of the future wonder of the plums this wizard of plum-growing, Luther Burbank, was bringing to the world.
  • Xerox Corporation released “A Metamorphosis of Creative Copying” in 1964, which seems to function as both promotion for Xerox and a weird gift to give to your kids to color in.
  • In 2014, a short zine called The Tao of Bitcoin was released, telling people the dream of $10,000 bitcoin would be real.
  • The 1888 chapbook Goody Two-Shoes has lovely illustrations, and a fine short story.
  • Working with a lovely couple who brought in a 1942 black-owned-businesses directory, I scanned the pages by hand and put them up into this item.
  • Inside that directory was an ad for a school of whistling that said it taught using the methods of Agnes Woodward, and a quick scan of the Archive’s stacks showed that we had an entire copy of her book Whistling as an Art!
  • The medical treatise Sleep and Its Derangements, from 1869, is William A. Hammond, MD’s overview of sleep, and what can go wrong. Scanned from the Francis A. Countway Library of Medicine, it’s one of many thousands of books we’ve scanned with partners.
  • Let Hartman Feather Your Nest could be described as “A furniture catalog” in the same way the Sistine Chapel could be described as “a place of worship”. The catalog is a thundering, fist-pounding declaration of the superiority of the Hartman enterprise and the quality and breadth of furniture and service that will arrive at your door and be backed up to the far reaches of time.

Magazines

  • Photoplay considered itself the magazine for the motion picture industry in the first part of the 20th century, and this multi-volume compilation of photos, articles and advertisements is a truly lovely overview.
  • There’s over 140 issues of the classic Maximum RockNRoll zine, truly the king of music zines for a very long time. On its newsprint pages are howls and screeches of all manner of punk, rock and the needs of musicians.
  • A magazine created by the Walt Disney Company to trumpet various parts of Disneyland and its attractions was called Vacationland, and this Fall 1965 issue covers all sorts of stuff about the park’s first decade.

Movies

  • Rescued from a warehouse years ago, a collection of Hollywood movie “B-Roll”, unused secondary scenes often filmed by different crew, has been digitized. My personal favorite is [Western Film Scenes], which is circa 1950s footage of a Western Town, all of it utterly fake but feeling weirdly real, to be used in a western. Don’t miss everyone standing around looking right at you and looking like they agree quite energetically with you!
  • No compilation could be complete without the legendary Duck and Cover, a cartoon/PSA that explained the simple ways to avoid injury in a nuclear blast. Just lie down! It’ll be fine. Please note: This Probably Won’t Work. But the song is very catchy.
  • The very weird Electric Film Format Acid Test from 1990 has a semi-interested model holding up a color bar plate in a wide, wide variety of film and video formats. Filmed just a few blocks away from the Internet Archive’s current headquarters.
  • I snuck in a 1992 interview with the Archive’s founder, Brewster Kahle, back when he was 33 and working at WAIS, a company or two before the Archive and where he is asked about his thoughts on information and gathering of data. It’s quite interesting to hear the consistency of thought.
  • The Office of War Information worked with Disney to create “Dental Health“, a film to show to troops about proper dental care. It’s a combination of straightforward animation and industrial film-making worth enjoying.

Audio

  • We have a collection of hours of the radio show The Shadow from 1938-1939, starring  Orson Welles at 23, at the height of his performance powers, playing the dual main role.
  • For Christmas Eve, we pointed to “Christmas Chopsticks”, a 1953 78rpm record of “Twas the Night Before Christmas” performed to the tune of the classic piano piece “Chopsticks”; one of tens of thousands of 78rpm records the Archive has been adding this year.
  • On Christmas, a user of the Archive uploaded two obscure albums he’d purchased on eBay – remnants of the S. S. Kresge Company, which became K-Mart, and which were played over the PA system for shoppers. He got his hands on Albums #261 and #294.
  • Earlier in the month before the user uploaded those Christmas albums, I linked to a different holiday collection of K-Mart items, a 1974 Reel-to-Reel that started with a K-Mart jingle and went full holiday from there.
  • Before he was a (retired) talk show host, and before he was a stand-up comedian, David Letterman worked and trained in radio. Happily, we have recordings of Dave Letterman, DJ, from when he was 22, at Ball State University.
  • Ron “Boogiemonster” Gerber has been hosting his weekly pop music recycling radio show, “Crap from the Past”, for over 25 years, and he’s been uploading and cataloging his show to the Archive for well over 10 of those years, including all the way back to the beginning of his show. The full Crap From The Past archive is up and is hundreds of hours of fun.
  • The truly weird “Conquer the Video Craze” is a 1982 record album with straightforward descriptions of how to beat games like Centipede, Defender, Stargate, Dig Dug, and more. This album has been sampled from by multiple DJs to bring that extra spice to a track.
  • Over 3,000 shows at the DNA Lounge are at the archive, including “Bootie: Gamer Night“, which combines mash-up tracks and video games. Bootie has been playing at DNA Lounge for years, and puts the audio from one song with the singing from another, and… it’s quite addicting, like games. This night was for the nearby Game Developers’ Conference being held the same week.

Software

  • In 2011, as part of a “retrocomputing” competition, we saw the release of “Paku-Paku”, a pac-clone program which ran in an obscure early PC-Compatible graphics mode that was very colorful and very small (160×100) and was built perfectly for it. You can play the game in your browser by clicking here.
  • Psion Chess is a game for the Macintosh that can play both you and itself with pretty high levels of skill and really sharp and crisp black and white graphics.  It makes a really great screensaver in self-playing mode.

People often overuse a phrase like “Barely scratched the surface”, but I assure you there are millions of amazing items in the archive, and it’s been a pleasure to bring some to light. While the 30 Days of Stuff was a fun way to stretch out a month of fundraising with stuff to see every day, we’re here 24/7 to bring you all these items, and welcome you finding jewels, gems and clunkers throughout our hard drives whenever you want.

Thanks for another year!

archive.org download counts of collections of items updates and fixes

Every month, we look over the total download counts for all public items at archive.org.  We sum item counts into their collections.  At year end 2014, we found various source reliability issues, as well as overcounting for “top collections” and many other issues.

archive.org public items tracked over time

archive.org public items tracked over time

To address the problems we did:

  • Rebuilt a new system to use our database (DB) for item download counts, instead of our less reliable (and more prone to “drift”) SOLR search engine (SE).
  • Changed monthly saved data from JSON and PHP serialized flatfiles to new DB table — much easier to use now!
  • Fixed overcounting issues for collections: texts, audio, etree, movies
  • Fixed various overcounting issues related to not unique-ing <collection> and <contributor> tags (more below)
  • Fixes to character encoding issues on <contributor> tags

Bonus points!

  • We now track *all collections*.  Previously, we only tracked items tagged:
    • <mediatype> texts
    • <mediatype> etree
    • <mediatype> audio
    • <mediatype> movies
  • For items we are tracking <contributor> tags (texts items), we now have a “Contributor page” that shows a table of historical data.
  • Graphs are now “responsive” (scale in width based on browser/mobile width)

 

The Overcount Issue for top collection/mediatypes

  • In the below graph, mediatypes and collections are shown horizontally, with a sample “collection hierarchy” today.
  • For each collection/mediatype, we show 1 example item, A B C and D, with a downloads/streams/views count next to it parenthetically.   So these are four items, spanning four collections, that happen to be in a collection hierarchy (a single item can belong to multiple collections at archive.org)
  • The Old Way had a critical flaw — it summed all sub-collection counts — when really it should have just summed all *direct child* sub-collection counts (or gone with our New Way instead)

overcount

So we now treat <mediatype> tags like <collection> tags, in terms of counting, and unique all <collection> tags to avoid items w/ minor nonideal data tags and another kind of overcounting.

 

… and one more update from Feb/1:

We graph the “difference” between absolute downloads counts for the current month minus the prior month, for each month we have data for.  This gives us graphs that show downloads/month over time.  However, values can easily go *negative* with various scenarios (which is *wickedly* confusing to our poor users!)

Here’s that situation:

A collection has a really *hot* item one month, racking up downloads in a given collection.  The next month, a DMCA takedown or otherwise removes the item from being available (and thus counted in the future).  The downloads for that collection can plummet the next month’s run when the counts are summed over public items for that collection again.  So that collection would have a negative (net) downloads count change for this next month!

Here’s our fix:

Use the current month’s collection “item membership” list for current month *and* prior month.  Sum counts for all those items for both months, and make the graphed difference be that difference.  In just about every situation that remains, graphed monthly download counts will be monotonic (nonnegative and increasing or zero).

 

 

Archive-It Team Encourages Your Contributions To The “Occupy Movement” Collection

Since September 17th, 2011 when protesters descended on Wall Street, set up tents, and refused to move until their voices were heard, an impassioned plea for economic and social equality has manifested itself in similar protests and demonstrations around the world. Inspired by “Occupy Wall Street (OWS)”, these global protests and demonstrations are collectively now being referred to as the “Occupy Movement”.

In an effort to document these historic, and politically and socially charged, events as they unfold, IA’s Archive-It team has recently created an “Occupy Movement” collection to begin capturing information about the movement found online. With blogs communicating movement ideals and demands, social media used to coordinate demonstrations, and news related websites portraying the movement from a dizzying variety of angles, the presence and representation of the Occupy Movement online is both hugely valuable to our understanding of the movement as a whole, while constantly in-flux and at-risk.

The value of the collection hinges on the diversity, depth, and breadth of our seeds and websites we crawl. We are asking and encouraging anyone with websites they feel are important to archive, sites that tell a story about the movement, to pass them along and we will add them to the Occupy Movement collection. These might include movement-wide or city-specific websites, sites with images, blogs, YouTube videos, even Twitter accounts of individuals or organizations involved with the movement. No ideas or additions are too small or too large; perhaps your ideas or suggestions will be a unique part of the movement not yet represented in our collection. IA Archive-It friends and partners are already sending in seeds, which we greatly appreciate.

The web content captured in this collection will be included in the General Archive collection at http://www.archive.org/details/occupywallstreet
which has been actively collecting materials on the Occupy Movement for a few months.

Please send any seeds suggestions, questions, or comments to Graham at graham@archive.org.