Salt Lake County Health Department
Automated Incorporation of Yelp Restaurant Reviews into Foodborne Illness Surveillance and Response
With 1,107,314 residents, Salt Lake County is the most populous county in the state of Utah, accounting for approximately 37% of the state's total population (US Census). Salt Lake County's local health department (LHD) was an outgrowth of the Society of Health established by Brigham Young in 1849, which later became the Salt Lake City Health Department and merged with county government in 1969 to form the Salt Lake City-County Health Department. Its name was changed to Salt Lake County Health Department (SLCoHD) in 2013.
SLCoHD protects the county's population from public health threats, including foodborne outbreaks. According to the Centers for Disease Control and Prevention (CDC), approximately 1 in 6 Americans become ill with foodborne diseases each year. Salt Lake County has an active restaurant community, with approximately 4,000 food establishments subject to inspection by the health department. It is important for individuals visiting restaurants or other food venues to be able to report foodborne illnesses and suspected causes to enable outbreaks to be identified and responded to quickly and effectively.
In 2014, SLCoHD sought to supplement its existing foodborne illness complaint reporting systems by identifying publicly posted Yelp reviews that were consistent with foodborne illness. The primary goal was to increase the number of foodborne illnesses being captured for surveillance without unsustainably increasing the staff time and resources required to find them. Objectives included exploration of collaboration opportunities, completion of a pilot period to determine feasibility and effectiveness, and evaluation of the impacts of implementation on surveillance outcomes and practices. Meetings were held in early 2015 with SLCoHD stakeholders from Epidemiology, Food Protection, and Administration, as well as two social media researchers from the University of Utah. Ultimately it was decided that collaboration with the university would not be feasible due to financial constraints, but Epidemiology staff were able to independently develop a Python computer program utilizing the Yelp application program interface (API). This program was automatically run twice weekly with likely foodborne illness reviews from Yelp evaluated by Epidemiology staff and included in existing criteria for prioritizing food establishment inspections. Over 22 months of pilot testing and implementing the practice, Yelp reviews accounted for a 17% increase in the total number of complaints used for surveillance, and these additional complaints led directly to an 11% increase in the number of foodborne illness-related requests for inspection sent by Epidemiology to Food Protection, resulting in the identification and correction of critical violations at the implicated establishments and the closure of one establishment associated with a foodborne outbreak. There was no financial investment required for this practice, and the benefits to SLCoHD were found to be worth the staff time invested. The program was continued after the conclusion of the pilot and has now been performing consistently for almost two full years.
The public health issue that needed to be addressed in Salt Lake County was detection of foodborne outbreaks from foodborne illness complaints. The Council to Improve Foodborne Outbreak Response (CIFOR) recommends that LHD foodborne illness complaint reporting systems collect enough complaints to detect more than 20 outbreaks per 1,000 complaints. In 2014, SLCoHD received 456 foodborne illness complaints, or approximately 412 per million county residents, and identified 25 potential outbreaks, amounting to about 55 suspect outbreaks per 1,000 complaints. While this appears to exceed CIFOR's recommendation, it can be difficult to determine whether or not a facility is the actual cause of a cluster of foodborne illness complaints, and past estimates of the number of actual outbreaks indicated that Salt Lake County fell short of the recommendation. However, outbreak occurrence varies widely from year-to-year, and SLCoHD uses the more stable criteria of two or more unrelated complaints within a 14 day period to initiate an outbreak investigation, which likely captures a number of false positive outbreaks, but ensures that facilities most likely to be causing illness are prioritized by inspectors. These potential outbreaks were used as a surrogate measure of the number of outbreaks detected from foodborne illness complaints for this analysis. Given CDC's estimate of 1 in 6 foodborne illnesses per year, these complaints likely represent only a small fraction of actual illnesses (about 1 in 400 estimated foodborne illnesses county-wide). Increasing the number of complaints would be expected to increase the number of potential foodborne outbreaks detected, and thus better ensure the safety of Salt Lake County residents from foodborne illnesses.
However, SLCoHD already had a robust foodborne illness surveillance system in place, receiving foodborne illness complaints from a variety of sources. The SLCoHD Food Protection Bureau receives telephone complaints from the public during business hours, and a smaller number of telephone complaints are received in the SLCoHD Epidemiology Bureau. The Utah Department of Health receives online foodborne illness complaints on its IGotSick website, which are forwarded to the LHDs where the complainants report that they reside. The Utah Poison Control Center also forwards reports they receive that may have to do with foodborne illness. When feasible, the SLCoHD Epidemiology Bureau follows up on these reports to obtain full exposure information. As described in a previous model practice, SLCoHD has combined these exposures from foodborne illness complaints with exposures identified during reportable disease investigations for food establishment-associated outbreak detection since 1998 (NACCHO Model Practices Database, 2006). This is another likely reason for the already high number of suspect outbreaks per 1,000 complaints detected by SLCoHD, since the inclusion of reportable disease exposures increases the likelihood of identifying clusters without increasing the number of complaints. Currently, when two or more complainants or reported cases name the same food establishment within a 14 day period, Epidemiology starts an investigation and sends a request to Food Protection for an onsite environmental health assessment. This allows potential outbreaks to be identified and responded to rapidly while restricting the deployment of food inspectors to those establishments most likely to actually be causing illness. Increasing the number of foodborne illness complaints would not change this process, but would increase its reach, potentially increasing the number of foodborne outbreaks that are identified and associated with food establishments.
Since a variety of traditional sources were already being used, it made sense to look at nontraditional sources of foodborne illness complaints. The New York City Department of Health and Mental Hygiene (DOHMH) conducted a pilot study during 2012 and 2013 using Yelp reviews to identify unreported cases of foodborne illness (MMWR 2014;63(20):441-445). Using a private data feedâ€ based on an agreement between Yelp and DOHMH, text classification programs developed through an additional agreement with Columbia University analyzed reviews and identified those most likely to be associated with foodborne illness. The program was deemed successful, and about half of the reviews identified by the algorithm were found to be consistent with foodborne illness during the previous four weeks. SLCoHD initially reached out to the University of Utah to attempt a similar approach, but ultimately developed its own innovative methodology using Python software (www.python.org) and Yelp's application program interface (API) that allowed foodborne illness-related reviews to be identified without external agreements or financial commitments. While Yelp reviews are only known to have been used for foodborne illness surveillance in a very small number of jurisdictions, Yelp's freely accessible API is not known to have been used for foodborne surveillance in any other jurisdictions. The Python program developed by SLCoHD uses existing Yelp API functionality to identify reviews consistent with foodborne illness in food facilities located within Salt Lake County, without sophisticated machine learning algorithms or expensive software. This makes it easily portable to other jurisdictions, with only the geographical area to be searched needing to be changed.
SLCoHD serves more than a million people, and Yelp is accessible as a commercial website to all residents of Salt Lake County that have access to an internet connection. Though SLCoHD's other complaint reporting systems perform admirably, Yelp provides another avenue for receiving foodborne illness complaints, and thereby identifying more outbreaks. This practice is closely aligned with the Food Safety focus area of CDC's Winnable Battles.
The primary goal of this practice was to identify additional foodborne illness complaints in Yelp in such a way that the number of new complaints justified the staff time and resources invested into finding them. Objectives included exploration of collaboration opportunities, completion of a pilot period to determine feasibility and effectiveness, and, if warranted, automation of the search function and full implementation of the practice, followed by evaluation of the impacts on surveillance outcomes and practices. As far as outcomes, the intent of the practice was to bring SLCoHD performance measures in greater alignment with CIFOR recommendations, specifically by increasing the number of suspect outbreaks identified from foodborne illness complaints, and to reduce the impact of such outbreaks by identifying more critical violations and implementing more interventions. While there were no other specific quantitative targets set in advance, SLCoHD Epidemiology Bureau staff responsible for foodborne illness surveillance and response were consulted for feedback throughout the pilot period and subsequent implementation.
An initial meeting was held in early January 2015 to discuss the New York City DOHMH approach and decide whether it was something SLCoHD should pursue. SLCoHD stakeholders from Epidemiology, Food Protection, and Administration were in attendance. A contact at the University of Utah with interest in social media research was identified and plans made to pursue collaboration between SLCoHD and the university. Immediately after the meeting, the Yelp API was discovered, and Epidemiology staff began exploratory development of a Python program to identify food establishments in Salt Lake County with Yelp reviews containing the term food poisoning.â€ The response from the API was saved in a text file which would be compared to the API response at the next query in order to identify new reviews, since the API only returns associated facilities and does not identify individual reviews by date. The program was manually run twice weekly in conjunction with SLCoHD's ongoing foodborne illness surveillance protocol. While this was a simplistic approach compared to the text classification algorithm employed by DOHMH, its successful identification of foodborne illness reviews resulted in the decision to begin a pilot period of unspecified duration to see how many complaints could be identified over time. It was decided that no follow-up would be attempted with the individuals posting the Yelp reviews, but plausible reviews would be included in SLCoHD foodborne illness surveillance activities and in criteria for inspection requests to be sent to Food Protection.
A second meeting was held in early February where the Python program was demonstrated and initial results presented, and the decision was made to continue to pilot the program while also continuing attempts to collaborate with the university. Roles and responsibilities were assigned such that Epidemiology would evaluate reviews and send inspection requests, Food Protection would conduct environmental investigations identified using the reviews, and public information officers in Administration would handle any needed communication with Yelp and set up social media accounts for staff as needed. A week later, a conference call with two University of Utah professors was held. They expressed great interest in the project, but stated that the collaboration would be contingent on funding for a graduate student to work on the project. Ideas for applying for grants and scholarships were discussed, but ultimately it was decided that pursuing the collaboration further would not be feasible because of the financial constraints.
The pilot was completed at the end of June, and the results were evaluated and found to be favorable. Though there had been plans to reach out to Yelp if needed, the pilot demonstrated that Yelp's API provided sufficient functionality to make any direct contact unnecessary. With approval from all stakeholders, the practice was fully implemented in July 2015. Windows Task Scheduler was used to run the Python program automatically on Monday and Thursday mornings each week, with links to the reviews on Yelp's website automatically saved to a shared drive accessible to assigned Epidemiology investigators for review of identified complaints. The practice has continued largely uninterrupted since then, with the program running automatically twice weekly with minimal upkeep and assigned investigators routinely verifying Yelp reviews and including them in their surveillance. The evaluation detailed below covers the six month pilot period from January through June of 2015 and 16 months of implementation from July 2015 through the end of October 2016. No media promotion or other efforts were made to publicize this practice, and there were no start up or in-kind costs.
During the 22 months of piloting and implementing the practice, 149 Yelp reviews were identified that were consistent with foodborne illness during the previous month, resulting in nine requests for inspection (60 per 1,000 complaints) sent to the SLCoHD Food Protection Bureau for implicated food establishments. During this time period, 491 foodborne illness complaints from Salt Lake County residents were received from UDOH's IGotSick website, 333 telephone complaints from the SLCoHD Bureau of Food Protection, 38 from the Utah Poison Control Center, and 31 from the SLCoHD Epidemiology Bureau, for a combined 893 non-Yelp complaints, and a total of 1,042 complaints altogether, resulting in 92 total requests for inspection. Overlap between reporting systems was minimal, with no Yelp reviews known to have also been reported elsewhere, though the absence of full last names in Yelp made matching less likely. Overall, the inclusion of Yelp reviews resulted in a 17% increase over the number of complaints that would have been used for surveillance otherwise, and these additional complaints led to an 11% increase in the number of foodborne illness-related requests for inspection sent to Food Protection. The rate of suspect outbreak detection increased from 55 suspect outbreaks per 1,000 complaints in 2014 to 88 suspect outbreaks per 1,000 complaints during the 22 months of using Yelp reviews, though this increase is likely due in large part to year-to-year fluctuations in foodborne illness reporting. Two inspection requests were declined by Food Protection due to insufficient information in the complaints, low risk classification of the implicated food establishments, or recent routine inspection. The seven other facilities identified by including Yelp reviews received onsite environmental health assessments, uncovering a total of 46 critical violations, including improper cooling methods, food temperature violations, cross-contamination issues, inadequate employee handwashing, and dirty equipment. Violations were significant enough at one facility that a follow-up inspection was performed a week later. Another facility was closed after additional reports of illness were received by the health department in the days after the initial inspection. The number of unrelated complaints received for this facility and findings of the inspectors indicated that it was an actual outbreak of foodborne illness caused by the facility. Because of the inclusion of Yelp reviews, this outbreak was detected six days earlier than it would have been otherwise, likely resulting in the closure of the facility earlier than would have occurred otherwise. Altogether, the inclusion of Yelp reviews in surveillance activities directly resulted in the identification and correction of at least one foodborne outbreak and a large number of critical violations that may have caused outbreaks or other adverse health effects. It is impossible to know exactly how many illnesses were prevented by these activities, but there is no question that they were beneficial to the public health of Salt Lake County.
Epidemiology staff members were supportive of the practice and gave positive feedback. A subset of three months of Yelp reviews collected by the program was analyzed in the fall of 2015. During that time, 632 Yelp reviews were evaluated for consistency with foodborne illness during the previous month. Of these, only 2.7%, or about 1 in 37, were confirmed. This is a far lower efficiency level than the better than 50% achieved by DOHMH. However, due to the Yelp API's inability to receive queries by review date, the majority were old foodborne illness-associated reviews that took only seconds to rule out. Furthermore, the use of the API removed the employee burden of collecting and preparing the data and the initial time required for developing and fine-tuning machine learning algorithms. Performed twice a week over three months, staff members evaluated an average of 24 reviews per session, taking approximately 5-10 minutes to complete each session. This was deemed by staff to be a low burden effort that could reasonably be incorporated into other ongoing foodborne illness surveillance activities. The original goal of increasing foodborne illness complaint detection without unduly burdening staff was determined to have been achieved.
While there was no financial investment required for this practice, several resources were necessary that may not be available at every LHD. The Python program used for this practice is specific to the Salt Lake County geographic area, so it would be necessary to have at least some initial assistance from someone proficient in Python or other similar coding languages and geographic information concepts in order to adapt the program to another jurisdiction. Furthermore, due to the Yelp API's restriction to no more than 20 businesses returned per query, the Salt Lake County geographic area had to be subdivided into smaller geographic areas to permit full coverage. As more businesses are identified over time, these areas periodically need to be subdivided further and the code changed accordingly to capture all reported businesses, so some limited technological support would likely be needed on an ongoing basis. The Python software used and the Yelp API are freely available, but Windows Task Scheduler must be purchased, though it was included with the general Microsoft Windows operating systems already in use by Salt Lake County, and is likely already available in most jurisdictions. While 5-10 minutes of staff time twice weekly was determined to be a low burden in this case, it may not be considered trivial for smaller LHDs with fewer resources. However, the benefits have been found to be worth the time commitment for SLCoHD, and it is anticipated that this practice will continue into the foreseeable future.