Category Archives: Image Archive

Meet the Librarians: Alexis Rossi, Media & Access

To celebrate National Library Week 2022, we are taking readers behind the scenes to Meet the Librarians who work at the Internet Archive and in associated programs.


Alexis Rossi has always loved books and connecting others with information. After receiving her undergraduate degree in English and creative writing, she became a book editor and then worked in online news. 

Alexis Rossi

In 2006, Rossi joined the staff of the Internet Archive. She was working on the launch of the Open Library project when she recognized the need to learn more about how to best organize materials. She enrolled at San Jose State University and earned her Master’s of Library and Information Science in 2010.

“It gave me a better grasp of how to hierarchically organize information in a way that is sensible and useful to other libraries,” Rossi said. “It also gave me better familiarity with how other more traditional libraries actually work—the types of data and systems they use.”

Rossi concentrated on web interfaces for library information, understanding digital metadata, and how to operate as a digital librarian. In addition to overseeing the Open Library project, at the Internet Archive, Rossi managed a revamp of the organization’s website, ran the Wayback Machine for four years, founded the webwide crawling program, and is currently a librarian and director of media & access.

“One of the themes of my life is trying to empower people to do whatever they want to do,” said Rossi, who grew up in Monterey, California, and now lives in San Francisco. “Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

“Giving people the resources to teach themselves—whatever they want to learn—is my driving force.”

Alexis Rossi, Media & Access

Rossi acknowledges she is privileged to have means to avail herself to an abundance of information, while many in other parts of the world do not. There are so many societal problems she cannot solve, Rossi said, but she believes her work is making a contribution.  

“We can build a library that allows people to access information for free, wherever they are, and however they can get to it, in whatever way. That, to me, is incredibly important,” Rossi said. It’s also rewarding to help patrons discover new information and recover materials they may have thought were lost, she added.

When she’s not working, Rossi enjoys making funky jewelry and elaborate cakes (a skill she learned on YouTube).

Among the millions of items and collections in the Internet Archive, what is Rossi’s favorite? Video and audio recordings of her dad, now 73, playing the piano, organ and accordion: “It’s just so good. It’s such a perfect little piece of history.”

Boston Phoenix Rises Again With New Online Access

For more than 40 years, The Boston Phoenix was the city’s largest alternative weekly in covering local politics, arts, and culture.

The Boston Phoenix, Volume 2, Issue 44 – October 30, 1973

“It was really a pretty legendary paper. The style of the writing and the quality of writers were nationally known,” said Carly Carioli, who started at the newspaper as an intern in 1993 and became its last editor-in-chief.

With the advent of online advertising, it struggled like many independent newspapers to compete. In 2013, the Phoenix folded.

After the publication shut down, owner Stephen Mindich wanted the public to be able to access back issues of the Phoenix. The complete run of the newspaper from 1973 to 2013 was donated to Northeastern University’s special collections. The family signed copyright over the university. 

Librarians led a crowdsourcing project to create a digital index of all the articles and authors, which was helpful for historians and others in their research, said Giordana Mecagni, head of special collections and university archivist. Northeastern had inquired about digitizing the collection, but it was cost prohibitive. 

As it turns out, the Internet Archive owned the master microfilm for the Phoenix and it put the full collection online in a separate collection: The Boston Phoenix 1973-2013. Initially, the back issues were only available for one patron to check out at a time through Controlled Digital Lending. Once Northeastern learned about the digitized collection, it extended rights to the Archive to allow the Phoenix to be downloaded without controls.

Read The Boston Phoenix at the Internet Archive

“All of a sudden it was free to the public. It was wonderful,” Mecagni said. “We get tons and tons of research requests for various  aspects of the Phoenix, so having it available online for free for people to download is a huge help for us.” 

Inquiries range from someone trying to track down a classified ad through which they met their spouse, or an individual looking up an article about a band. The paper was a leader in writing groundbreaking stories about the LGBTQ community, the AIDS crisis, race and the Vietnam War—often issues not covered in the mainstream press. “Making that coverage public is adding an immense amount to the historical record that would not be there otherwise,” said Carioli. He said he appreciates the preservation and easy access to back issues, as do other journalists, researchers and academics.

“It’s a dream come true,” said Carioli of the Internet Archive’s digitization of the newspaper. “The Phoenix was invaluable in its own time, and I think it will be invaluable for a new generation who are just discovering it now. It was a labor of love then and the fact that it’s online now is huge for Boston, but also for anyone who’s interested in independent media and culture.”

Pretend you’re here with Internet Archive Zoom backgrounds

Have you seen these gorgeous library backgrounds you can use to pretend you’re amongst the smell of of old books and hushed page turning?

When I saw them I got a little jealous and thought, “computers are just as soothing!” So without further ado, welcome to your Internet Archive virtual Zoom backgrounds.

We’ve got a pretty majestic building you could sit in front of. There’s free wifi.

Or you can come inside and sit in the Great Room with us, stained glass dome and all.

Sit quietly amongst the pews with our little Internet Archivist sculptures by Nuala Creed.

Or have them be your backup dancers / Greek chorus on all your calls.

You can sit amongst the films waiting to be digitized.

Or pretend to be digitizing them yourself.

Scan books seated in front of a Table Top Scribe.

Or sit with the constant hum of busy servers in the background.

Images of Afghanistan 1987-1994

Afghan Media Resource Center’s correspondent interviewing a Muj Commander, 1991

Journalists and others risk their lives to keep the public informed in times of conflict. War imagery provides us with important information in the moment, and creates a trove of invaluable archival content for the future.

Please be aware that this collection contains some disturbing photos of violence and its aftermath (though we have not included any in this blog post).

The Afghan Media Resource Center (AMRC) was founded in Peshawar, Pakistan, in 1987, by a team of media trainers working under contract to Boston University. The goal of the project was to assist Afghans to produce and distribute accurate and reliable accounts of the Afghan war to news agencies and television networks throughout the world.  Beginning in the early 1980’s amidst a news blackout imposed by the Soviet backed Kabul government, foreign journalists had become targets to be captured or killed. The AMRC was an effort to overcome the substantial obstacles encountered by media representatives in bringing events surrounding the Afghan-Soviet war to world attention.

An armed Muj posing for the camera, 1988

Beginning in 1987, a series of six week training sessions were conducted at the AMRC original home in University Town, Peshawar, Pakistan.  Qualified Afghans were recruited from all major political parties, all major ethnic groups and all regions of Afghanistan, to receive professional training in print journalism, photo journalism and video news production.  Haji Sayed Daud, a former television producer and journalist at Kabul TV before the Soviet invasion, was named AMRC Director.

After the completion of their training, 3-person teams were dispatched on specific stories throughout Afghanistan’s 27 provinces, with 35mm cameras, video cameras, notebooks, and audio tape recorders. Photo materials were distributed internationally through SYGMA and Agence France Press (AFP). Video material was syndicated and broadcast by VisNews (now Reuters), with 150 broadcasters in 87 countries, Euronews and London-based WTN (now Associated Press), Thames Television, ITN, Swedish, French, Pakistani and other regional networks.

A young girl carrying clean drinking water, 1989

In 2000 AMRC began publishing a popular and influential newspaper in Kabul: ERADA (Intention). With one interruption, ERADA publication continued until 2012.

Beyond the AMRC archive, the AMRC conducted dozens of training programs and workshops for writers and radio journalists, including training programs for Refugee Women in Development (REFWID). The AMRC also established radio and TV studios in the provincial capitaol, Jalalabad, and produced radio and TV programs, including educational radio dramas, for a variety of international organizations. AMRC also conducted public opinion polls in Afghanistan, including an extensive Media Use Survey in Afghanistan, financed by InterMedia, a Washington, D.C. group.

Armed Muj pulling out an unexploded missile, 1989

The AMRC collection spans a critical period in Afghanistan’s history – (1987 – 1994), including 76,000 photographs, 1,175 hours of video material, 356 hours of audio material, and many stories from print media.

An Afghan weaving carpet, 1990

In 2012 AMRC received a grant to digitize the entire AMRC archive, to preserve the collection at the U.S. Library of Congress. AMRC senior media advisors Stephen Olsson and Nick Mills were trained in the digitization processes by the Library of Congress, then spent two weeks in Kabul training the AMRC staff. The digitization and metadata sheets (in English, Dari and Pashto) were completed in 2016, and were welcomed into the Library of Congress with a formal ceremony.  We are now making the entire AMRC collection available through our on-line partner, The Internet Archive.

Now the entire collection is readily available to scholars, researchers and publishers.  All royalties for commercial use of the photo images and video material will continue to support the non-profit work of the AMRC.

30 Days of Stuff

Jason Scott, free-range archivist, reporting in as 2017 draws to a close.

As part of our end-of-year fundraising drive, I thought it might be fun to tweet highlighted parts of the vast stacks of content that the Internet Archive makes available for free to millions. A lot of folks know about our Wayback Machine and its 20+ years of website history, but there’s petabytes of media and works available to see throughout the site. I called it “30 Days of Stuff”, and for the last 30 days I’ve been pointing out great items at the Archive, once a day.

You won’t have to swim upstream through my tweets; here on the last day, I’ve compiled the highlighted items in this entry. Enjoy these jewels in the Archive’s collection, a small sample of the wide range of items we provide.

Books and Texts

  • The Latch Key of my Bookhouse was one of the first books scanned by the Internet Archive in its book scanner tests, and it’s a 1921 directory of Children’s Literature that is filled with really nice illustrations that came out great.
  • As part of our ever-growing set of Defense Technical Information Center collection, we have The Role of the Citizens Band Radio Service and Travelers Information Stations In Civil Preparedness Emergencies Final Report, a 1978 overview of CB Radio and what role it might play in civil emergencies. Many thousands of taxpayer-funded educational and defense items are mirrored in this collection.
  • Also in the DTIC collection is The Battalion Commander’s Handbook 1980, which besides the crazy front page of stamps, approvals and sign-offs, is basically a manager’s handbook written from the point of view of the US Army.
  • There are hundreds of tractor manuals at the Archive. Hundreds! Of all types, languages (a lot of them Russian) and level of information. Tractors are one of those tools that can last generations and keeping the maintenance on them in the field can make a huge difference in livelihood.
  • A lovely 1904 catalog for plums called The Maynard Plum Catalogue was scanned in with one of our partner organizations and it’s a breathless and inspiring declaration of the future wonder of the plums this wizard of plum-growing, Luther Burbank, was bringing to the world.
  • Xerox Corporation released “A Metamorphosis of Creative Copying” in 1964, which seems to function as both promotion for Xerox and a weird gift to give to your kids to color in.
  • In 2014, a short zine called The Tao of Bitcoin was released, telling people the dream of $10,000 bitcoin would be real.
  • The 1888 chapbook Goody Two-Shoes has lovely illustrations, and a fine short story.
  • Working with a lovely couple who brought in a 1942 black-owned-businesses directory, I scanned the pages by hand and put them up into this item.
  • Inside that directory was an ad for a school of whistling that said it taught using the methods of Agnes Woodward, and a quick scan of the Archive’s stacks showed that we had an entire copy of her book Whistling as an Art!
  • The medical treatise Sleep and Its Derangements, from 1869, is William A. Hammond, MD’s overview of sleep, and what can go wrong. Scanned from the Francis A. Countway Library of Medicine, it’s one of many thousands of books we’ve scanned with partners.
  • Let Hartman Feather Your Nest could be described as “A furniture catalog” in the same way the Sistine Chapel could be described as “a place of worship”. The catalog is a thundering, fist-pounding declaration of the superiority of the Hartman enterprise and the quality and breadth of furniture and service that will arrive at your door and be backed up to the far reaches of time.

Magazines

  • Photoplay considered itself the magazine for the motion picture industry in the first part of the 20th century, and this multi-volume compilation of photos, articles and advertisements is a truly lovely overview.
  • There’s over 140 issues of the classic Maximum RockNRoll zine, truly the king of music zines for a very long time. On its newsprint pages are howls and screeches of all manner of punk, rock and the needs of musicians.
  • A magazine created by the Walt Disney Company to trumpet various parts of Disneyland and its attractions was called Vacationland, and this Fall 1965 issue covers all sorts of stuff about the park’s first decade.

Movies

  • Rescued from a warehouse years ago, a collection of Hollywood movie “B-Roll”, unused secondary scenes often filmed by different crew, has been digitized. My personal favorite is [Western Film Scenes], which is circa 1950s footage of a Western Town, all of it utterly fake but feeling weirdly real, to be used in a western. Don’t miss everyone standing around looking right at you and looking like they agree quite energetically with you!
  • No compilation could be complete without the legendary Duck and Cover, a cartoon/PSA that explained the simple ways to avoid injury in a nuclear blast. Just lie down! It’ll be fine. Please note: This Probably Won’t Work. But the song is very catchy.
  • The very weird Electric Film Format Acid Test from 1990 has a semi-interested model holding up a color bar plate in a wide, wide variety of film and video formats. Filmed just a few blocks away from the Internet Archive’s current headquarters.
  • I snuck in a 1992 interview with the Archive’s founder, Brewster Kahle, back when he was 33 and working at WAIS, a company or two before the Archive and where he is asked about his thoughts on information and gathering of data. It’s quite interesting to hear the consistency of thought.
  • The Office of War Information worked with Disney to create “Dental Health“, a film to show to troops about proper dental care. It’s a combination of straightforward animation and industrial film-making worth enjoying.

Audio

  • We have a collection of hours of the radio show The Shadow from 1938-1939, starring  Orson Welles at 23, at the height of his performance powers, playing the dual main role.
  • For Christmas Eve, we pointed to “Christmas Chopsticks”, a 1953 78rpm record of “Twas the Night Before Christmas” performed to the tune of the classic piano piece “Chopsticks”; one of tens of thousands of 78rpm records the Archive has been adding this year.
  • On Christmas, a user of the Archive uploaded two obscure albums he’d purchased on eBay – remnants of the S. S. Kresge Company, which became K-Mart, and which were played over the PA system for shoppers. He got his hands on Albums #261 and #294.
  • Earlier in the month before the user uploaded those Christmas albums, I linked to a different holiday collection of K-Mart items, a 1974 Reel-to-Reel that started with a K-Mart jingle and went full holiday from there.
  • Before he was a (retired) talk show host, and before he was a stand-up comedian, David Letterman worked and trained in radio. Happily, we have recordings of Dave Letterman, DJ, from when he was 22, at Ball State University.
  • Ron “Boogiemonster” Gerber has been hosting his weekly pop music recycling radio show, “Crap from the Past”, for over 25 years, and he’s been uploading and cataloging his show to the Archive for well over 10 of those years, including all the way back to the beginning of his show. The full Crap From The Past archive is up and is hundreds of hours of fun.
  • The truly weird “Conquer the Video Craze” is a 1982 record album with straightforward descriptions of how to beat games like Centipede, Defender, Stargate, Dig Dug, and more. This album has been sampled from by multiple DJs to bring that extra spice to a track.
  • Over 3,000 shows at the DNA Lounge are at the archive, including “Bootie: Gamer Night“, which combines mash-up tracks and video games. Bootie has been playing at DNA Lounge for years, and puts the audio from one song with the singing from another, and… it’s quite addicting, like games. This night was for the nearby Game Developers’ Conference being held the same week.

Software

  • In 2011, as part of a “retrocomputing” competition, we saw the release of “Paku-Paku”, a pac-clone program which ran in an obscure early PC-Compatible graphics mode that was very colorful and very small (160×100) and was built perfectly for it. You can play the game in your browser by clicking here.
  • Psion Chess is a game for the Macintosh that can play both you and itself with pretty high levels of skill and really sharp and crisp black and white graphics.  It makes a really great screensaver in self-playing mode.

People often overuse a phrase like “Barely scratched the surface”, but I assure you there are millions of amazing items in the archive, and it’s been a pleasure to bring some to light. While the 30 Days of Stuff was a fun way to stretch out a month of fundraising with stuff to see every day, we’re here 24/7 to bring you all these items, and welcome you finding jewels, gems and clunkers throughout our hard drives whenever you want.

Thanks for another year!

archive.org download counts of collections of items updates and fixes

Every month, we look over the total download counts for all public items at archive.org.  We sum item counts into their collections.  At year end 2014, we found various source reliability issues, as well as overcounting for “top collections” and many other issues.

archive.org public items tracked over time

archive.org public items tracked over time

To address the problems we did:

  • Rebuilt a new system to use our database (DB) for item download counts, instead of our less reliable (and more prone to “drift”) SOLR search engine (SE).
  • Changed monthly saved data from JSON and PHP serialized flatfiles to new DB table — much easier to use now!
  • Fixed overcounting issues for collections: texts, audio, etree, movies
  • Fixed various overcounting issues related to not unique-ing <collection> and <contributor> tags (more below)
  • Fixes to character encoding issues on <contributor> tags

Bonus points!

  • We now track *all collections*.  Previously, we only tracked items tagged:
    • <mediatype> texts
    • <mediatype> etree
    • <mediatype> audio
    • <mediatype> movies
  • For items we are tracking <contributor> tags (texts items), we now have a “Contributor page” that shows a table of historical data.
  • Graphs are now “responsive” (scale in width based on browser/mobile width)

 

The Overcount Issue for top collection/mediatypes

  • In the below graph, mediatypes and collections are shown horizontally, with a sample “collection hierarchy” today.
  • For each collection/mediatype, we show 1 example item, A B C and D, with a downloads/streams/views count next to it parenthetically.   So these are four items, spanning four collections, that happen to be in a collection hierarchy (a single item can belong to multiple collections at archive.org)
  • The Old Way had a critical flaw — it summed all sub-collection counts — when really it should have just summed all *direct child* sub-collection counts (or gone with our New Way instead)

overcount

So we now treat <mediatype> tags like <collection> tags, in terms of counting, and unique all <collection> tags to avoid items w/ minor nonideal data tags and another kind of overcounting.

 

… and one more update from Feb/1:

We graph the “difference” between absolute downloads counts for the current month minus the prior month, for each month we have data for.  This gives us graphs that show downloads/month over time.  However, values can easily go *negative* with various scenarios (which is *wickedly* confusing to our poor users!)

Here’s that situation:

A collection has a really *hot* item one month, racking up downloads in a given collection.  The next month, a DMCA takedown or otherwise removes the item from being available (and thus counted in the future).  The downloads for that collection can plummet the next month’s run when the counts are summed over public items for that collection again.  So that collection would have a negative (net) downloads count change for this next month!

Here’s our fix:

Use the current month’s collection “item membership” list for current month *and* prior month.  Sum counts for all those items for both months, and make the graphed difference be that difference.  In just about every situation that remains, graphed monthly download counts will be monotonic (nonnegative and increasing or zero).

 

 

Archive-It Team Encourages Your Contributions To The “Occupy Movement” Collection

Since September 17th, 2011 when protesters descended on Wall Street, set up tents, and refused to move until their voices were heard, an impassioned plea for economic and social equality has manifested itself in similar protests and demonstrations around the world. Inspired by “Occupy Wall Street (OWS)”, these global protests and demonstrations are collectively now being referred to as the “Occupy Movement”.

In an effort to document these historic, and politically and socially charged, events as they unfold, IA’s Archive-It team has recently created an “Occupy Movement” collection to begin capturing information about the movement found online. With blogs communicating movement ideals and demands, social media used to coordinate demonstrations, and news related websites portraying the movement from a dizzying variety of angles, the presence and representation of the Occupy Movement online is both hugely valuable to our understanding of the movement as a whole, while constantly in-flux and at-risk.

The value of the collection hinges on the diversity, depth, and breadth of our seeds and websites we crawl. We are asking and encouraging anyone with websites they feel are important to archive, sites that tell a story about the movement, to pass them along and we will add them to the Occupy Movement collection. These might include movement-wide or city-specific websites, sites with images, blogs, YouTube videos, even Twitter accounts of individuals or organizations involved with the movement. No ideas or additions are too small or too large; perhaps your ideas or suggestions will be a unique part of the movement not yet represented in our collection. IA Archive-It friends and partners are already sending in seeds, which we greatly appreciate.

The web content captured in this collection will be included in the General Archive collection at http://www.archive.org/details/occupywallstreet
which has been actively collecting materials on the Occupy Movement for a few months.

Please send any seeds suggestions, questions, or comments to Graham at graham@archive.org.

The Awesomeness of Yosemite

Just back from a stay in Yosemite Valley. Just awesome…as it always has been.

So of course I came back and had to check on some of the history and other interesting information about the valley at the Archive. There’s a wealth of stuff found by simply searching “yosemite“.


This one from 1905 is one of the earliest with photos.  Lots of changes in the man-made aspects of the valley but not to the natural landforms that are so familiar and, well, awesome in the real sense of the word.

http://www.archive.org/details/discoveryofyosem01bunn
This 3rd edition from 1897 gives an account of the The Indian Wars that led to the discovery of Yo-Semite.

http://www.archive.org/details/yosemiteitshisto00lest
This one from 1873 might be the oldest book we have on Yosemite.

http://www.archive.org/details/yosemite00unkngoog
And of course there is “The Yosemite” by John Muir from 1912.

A great trip as always. The valley may be more crowded than a century ago but the experience is still inspiring and …awesome.

-Jeff Kaplan

NASA on The Commons

From nasaimages.org, a service of Internet Archive:

Internet Archive, NASA, and Flickr are together launching NASA on The Commons, a new way to view and interact with photos from NASA. NASA on The Commons invites the public to contribute information and knowledge to curated photo sets provided by nasaimages.org.  Visitors will be able to add tags, keywords, and annotations to three compilations of images curated by the New Media Innovation Team at NASA Ames and NASA photography and history experts across the Agency. The three collections, spanning more than half a century of NASA history, include: Launch and Takeoff, Building NASA, and Center Namesakes.

“NASA’s long-standing partnership with Internet Archive and this new one with Yahoo!’s Flickr provides an opportunity for the public to participate in the process of discovery,” said Debbie Rivera, lead for the NASA Images project at the agency’s headquarters in Washington. “In addition, the public can help the agency capture historical knowledge about missions and programs through this new resource and make it available for future generations.”

NASA on The Commons will make the NASA Images collection accessible to a wider audience while improving the information that accompanies these images with the help of the public.
Read more about NASA on The Commons.